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Chapter 1 


Quantum Computing 


1.1 Introduction 


Today’s computers—both in theory (Turing machines) and practice (PCs, HPCs, laptops, tablets, 
smartphones, ...)—are based on classical physics. They are limited by locality (operations have 
only local effects) and by the classical fact that systems can be in only one state at the time. How- 
ever, modern quantum physics tells us that the world behaves quite differently. A quantum system 
can be in a superposition of many different states at the same time, and can exhibit interference 
effects during the course of its evolution. Moreover, spatially separated quantum systems may be 
entangled with each other and operations may have “non-local” effects because of this. 

Quantum computation is the field that investigates the computational power and other proper- 
ties of computers based on quantum-mechanical principles. It combines two of the most important 
strands of 20th-century science: quantum mechanics (developed by Planck, Einstein, Bohr, Heisen- 
berg, Schrödinger and others in the period 1900-1925) and computer science (whose birth may be 
dated to Turing’s 1936 paper [237]). An important objective is to find quantum algorithms that 
are significantly faster than any classical algorithm solving the same problem. 

Quantum computation started in the early 1980s with suggestions for analog quantum comput- 
ers by Yuri Manin [185] (and appendix of [186]), Richard Feynman [111, 112], and Paul Benioff [47], 
and reached more digital ground when in 1985 David Deutsch defined the universal quantum Tur- 
ing machine [97]. See Preskill [201] for more on this early history. The following years saw only 
sparse activity, notably the development of the first algorithms by Deutsch and Jozsa [99] and by 
Simon [230], and the development of quantum complexity theory by Bernstein and Vazirani [53]. 
However, interest in the field increased tremendously after Peter Shor’s very surprising discovery 
of efficient quantum algorithms for the problems of integer factorization and discrete logarithms in 
1994 [228], which was inspired by Simon’s work. Since most of current classical cryptography is 
based on the assumption that these two problems are computationally hard, the ability to actually 
build and use a quantum computer would allow us to break most current classical cryptographic 
systems, notably the RSA system [210, 211]. In contrast, a quantum form of cryptography due to 
Bennett and Brassard [51] is unbreakable even for quantum computers. 

Here are three reasons to study quantum computers, from practical to more philosophical: 


1. The process of miniaturization that has made current classical computers so powerful and 
cheap, has already reached micro-levels where quantum effects occur. Chipmakers tend to go 
to great lengths to suppress those quantum effects, forcing their bits and logical operations 


to behave classically, but instead one might also try to work with them, enabling further 
miniaturization. 


2. Making use of quantum effects allows one to speed up certain computations enormously 
(sometimes exponentially), and even enables some things that are impossible for classical 
computers. The main purpose of these lecture notes is to explain these advantages of quantum 
computing (algorithms, crypto, etc.) in detail. 


3. Finally, one might say that the main goal of theoretical computer science is to “study the 
power and limitations of the strongest-possible computational devices that Nature allows 
us.” Since our current understanding of Nature is quantum mechanical, theoretical computer 
science should arguably be studying the power of quantum computers, not classical ones. 


Before limiting ourselves to theory, let us say a few words about practice: to what extent will 
quantum computers ever be built? At this point in time, it is just too early to tell. The first 
small 2-qubit quantum computer was built in 1997 and in 2001 a 5-qubit quantum computer was 
used to successfully factor the number 15 [240]. Since then, experimental progress on a number 
of different technologies has been steady but slow. The most advanced implementations currently 
use superconducting qubits and ion-trap qubits. The largest quantum computation done at the 
time of writing is Google’s “quantum supremacy” experiment on 53 qubits [30], which performs 
a complicated (but rather useless) sampling task that appears to be no longer simulatable in a 
reasonable amount of the time on even the largest existing classical supercomputer. 

The practical problems facing physical realizations of quantum computers seem formidable. The 
problems of noise and decoherence have to some extent been solved in theory by the discovery of 
quantum error-correcting codes and fault-tolerant computing (see, e.g., Chapter 20 in these notes), 
but these problems are by no means solved in practice. On the other hand, we should realize 
that the field of physical realization of quantum computing is still in its infancy and that classical 
computing had to face and solve many formidable technical problems as well—interestingly, often 
these problems were even of the same nature as those now faced by quantum computing (e.g., 
noise-reduction and error-correction). Moreover, while the difficulties facing the implementation 
of a full quantum computer may seem daunting, more limited applications involving quantum 
communication have already been implemented with some success, for example teleportation (which 
is the process of sending qubits using entanglement and classical communication), and versions of 
BB84 quantum key distribution are nowadays even commercially available. 

Even if the theory of quantum computing never materializes to a real large-scale physical com- 
puter, quantum-mechanical computers are still an extremely interesting idea which will bear fruit 
in other areas than practical fast computing. On the physics side, it may improve our understand- 
ing of quantum mechanics. The emerging theories of entanglement and of Hamiltonian complexity 
have already done this to some extent. On the computer science side, the theory of quantum 
computation generalizes and enriches classical complexity theory and may help resolve some of its 
problems (see Section 15.3 for an example). 


1.2 Quantum mechanics 


Here we give a brief and abstract introduction to quantum mechanics. In short: a quantum state is 
a superposition of classical states, written as a vector of amplitudes, to which we can apply either 
a measurement or a unitary operation. For the required linear algebra we refer to Appendix A. 


1.2.1 Superposition 


Consider some physical system that can be in N different, mutually exclusive classical states. 
Because we will typically start counting from 0 in these notes, we call these states |0),|1),...,|N—1). 
Roughly, by a “classical” state we mean a state in which the system can be found if we observe it. 
A pure quantum state (usually just called state) |¢) is a superposition of classical states, written 


|) = ao|0) + ai|1) +--+ an_i1|N — 1). 


Here a; is a complex number that is called the amplitude of |i) in |¢). Intuitively, a system in 
quantum state |@) is “in all classical states at the same time,” each state having a certain amplitude. 
It is in state |0) with amplitude ao, in state |1) with amplitude a1, and so on. Mathematically, 
the states |0),...,|N — 1) form an orthonormal basis of an N-dimensional Hilbert space (i.e., an 
N-dimensional vector space equipped with an inner product). A quantum state |¢) is a vector in 
this space, usually written as an N-dimensional column vector of its amplitudes: 


Such a vector is sometimes called a “ket.” It conjugate transpose is the following row vector, 
sometimes called a “bra”: 

(ol = (a9, ee ,ay_1) : 
The reason for this terminology (often called “Dirac notation” after Paul Dirac) is that an inner 
product (¢|wW) between two states corresponds to the dot product between a bra and a ket vector 
(“bracket”): (dlv) = (| - |v). 

We can combine different Hilbert spaces using tensor product: if |0),...,|N — 1) are an or- 
thonormal basis of space H4 and |0),...,|M — 1) are an orthonormal basis of space Hpg, then the 
tensor product space H = Ha ® Hg is an NM-dimensional space spanned by the set of states 
Il S |j) | i € {0,...,N —1},7 € {0,...,M@ —1}}. An arbitrary state in H is of the form 
seus a Sa aij|t) Q |j). Such a state is called bipartite. Similarly we can have tripartite states 
that “live” in a Hilbert space that is the tensor product of three smaller Hilbert spaces, etc. 

There are two things we can do with a quantum state: measure it or let it evolve unitarily 
without measuring it. We will deal with measurement first. 


1.2.2 Measurement 
Measurement in the computational basis 


Suppose we measure state |¢). We cannot “see” a superposition itself, but only classical states. 
Accordingly, if we measure state |¢) we will see one and only one classical state |j}. Which specific 
|j) will we see? This is not determined in advance; the only thing we can say is that we will 
see state |j} with probability |a;|?, which is the squared norm of the corresponding amplitude aj. 
This is known as “Born’s rule.” Accordingly, observing a quantum state induces a probability 
distribution on the classical states, given by the squared norms of the amplitudes. This implies 
D |a;|? = 1, so the vector of amplitudes has (Euclidean) norm 1. If we measure |ġ) and get 
outcome j as a result!, then |¢) itself has “disappeared,” and all that is left is |j). In other words, 


‘Don’t use the ambiguous phrase “we measure j” in this case, since it’s not clear in that phrasing whether |j} is 


the state you’re applying the measurement to, or the outcome of the measurement. 
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observing |¢) “collapses” the quantum superposition |) to the classical state |j) that we saw, and 
all “information” that might have been contained in the amplitudes a; is gone. Note that the 
probabilities of the various measurement outcomes are exactly the same when we measure |) or 
when we measure state e|); because of this we sometimes say that the “global phase” et? has no 
physical significance. 


Projective measurement 


For most of the topics in these notes, the above “measurement in the computational (or standard) 
basis” suffices. However, somewhat more general kinds of measurement than the above are possible 
and sometimes useful. The remainder of this subsection may be skipped on a first reading, but will 
become more relevant in the later parts of these notes, starting from Chapter 15. 

A projective measurement on some space, with m possible outcomes, is a collection of projectors 
P,,..., Pm that all act on that same space and that sum to identity, aT P;=1 2 These projectors 
are then pairwise orthogonal, meaning that P;P; = 0 if i Æ j. The projector P; projects on some 
subspace V; of the total Hilbert space V, and every state |¢) € V can be decomposed in a unique 
way as |d) = pe) |d;), with |¢;) = P;|¢) € Vj. Because the projectors are orthogonal, the 
subspaces Vj are orthogonal as well, as are the states |¢;). When we apply this measurement to 
the pure state |), then we will get outcome j with probability |||¢;)||? = Tr(P;|¢)(o]) = (4| Pile) 
and the measured state will then “collapse” to the new state |¢;)/|||@;)|| = P;|¢)/||P;|¢)||.> The 
probabilities sum to 1 thanks to our assumption that ee P; = I and the fact that trace is a 
linear function: 


(Pld) (4l) = T Peel) = Trilo) = (916) = 1. 
j=1 


j=l 


Note carefully that we cannot choose which P; will be applied to the state but can only give a 
probability distribution. However, if the state |¢) that we measure lies fully within one of the 
subspaces V;, then the measurement outcome will be that j with certainty. 

For example, a measurement in the computational basis on an N-dimensional state is the specific 
projective measurement where m = N and P; = |j)(j|. That is, P; projects onto the computational 
basis state |j) and the corresponding subspace V; C V is the 1-dimensional subspace spanned by 
|j). Consider the state |¢) = ely aj;|j). Note that P;|¢) = aj;|j), so applying our measurement 
to |) will give outcome j with probability ||a,| DI = |a;|?, and in that case the state collapses 
to aj|g)/|la5|9) || = Tl | j). The norm-1 factor el may be disregarded because it has no physical 
significance, so we ne up with the state |j) as we saw before. 

Instead of the standard orthonormal basis, with basis states |0),...,|N — 1), we may consider 
any other orthonormal basis B of states |Wo),...,|Wn-—1), and consider the projective measurement 
defined by the projectors P; = |w;)(w,;|. This is called “measuring in basis B.” Applying this 
measurement to state |) gives outcome j with probability (4|P;|¢) = |(¢|w;)|?. Note that if |) 
equals one of the basis vectors |7);), then the measurement gives that outcome j with probability 1 

In the previous two examples the projectors had rank 1 (i.e., project on 1-dimensional sub- 
spaces), but this is not necessary. For example, a measurement that distinguishes between |j) 


?The m projectors together form one measurement; don’t use the word “measurement” for individual Pjs. 
3Don’t confuse the outcome of the measurement, which is the label j of the projector P; that was applied, and 
the post-measurement state, which is P;|)/||P;|¢)|]. 


with j < N/2 and |j) with j > N/2 corresponds to the two projectors Pi = )ij;<nyo|9)(J| and 
P = 7 j>N7/2|9){9|, each of rank N/2 (assume N is even). Applying this measurement to the state 


|p) = all) + /3IN) gives outcome 1 with probability ||P,|¢)||? = 1/3, in which case the state 
collapses to |1). It gives outcome 2 with probability || P2|¢)||? = 2/3, the state then collapses to |N}. 


Observables 


A projective measurement with projectors P,,..., Pm and associated distinct outcomes 1,...,Am € 
R, can be written as one matrix M = 7)", A;P;, which is called an observable. This is a succinct 
way of writing down the projective measurement in one matrix, and has the added advantage that 
the expected value of the outcome can be easily calculated: if we are measuring a state |¢), then 
the probability of outcome A; is ||P;|¢)||? = Tr(P,|¢) (|), so the expected value of the outcome is 
Dic Ai Tr (Pil d) el) = Tro", AiPild)(O|) = Tr(M]|¢)(¢|). Note that M is Hermitian: M = M*. 
Conversely, since every Hermitian M has a spectral decomposition M = 57)", \;P;, there is a direct 
correspondence between observables and Hermitian matrices. 

The Pauli matrices 7, X,Y, Z (see Appendix A.9) are examples of 2-dimensional observables, 
with eigenvalues +1. For example, Z = |0)(0| — |1)(1| corresponds to measurement in the compu- 
tational basis (with measurement outcomes +1 and —1 for |0) and |1), respectively). 

Suppose we have a bipartite state. An observable A on the first part of the state corresponds 
to an observable A @ J on the bipartite state. Similarly, an observable B on the second part of the 
state corresponds to an observable J & B on the bipartite state. Separately measuring observables 
A and B on the two parts of a bipartite state is different from measuring the joint observable 
AQ B: the separate measurements give one outcome each, while the joint measurement gives 
only one outcome, and the distribution on the post-measurement state may be different. What is 
true, however, is that the measurement statistics of the product of outcomes is the same as the 
measurement statistics of the outcome of the joint measurement. For example consider the case 
when A = B = Z (these correspond to measurement in the computational basis), and the bipartite 
state is |b) = 3 (10) @ |0) + |1) @ |1)). With the separate measurements, the outcomes will be 
++ or —— (note that in both cases the product of the two outcomes is +1) and the state |) will 
collapse to either |0) & |0) or |1) & |1). Yet |¢) remains undisturbed by a joint measurement with 
+1-valued observable Z ® Z, because |¢) is a +1-eigenstate of Z @ Z. 


POVM measurement 


If we only care about the final probability distribution on the m outcomes, not about the result- 
ing post-measurement state, then the most general type of measurement we can do is a so-called 
positive-operator-valued measure (POVM). This is specified by m positive semidefinite (psd) ma- 
trices E),..., Em that sum to identity. When measuring a state |¢), the probability of outcome 
i is given by Tr(£;|¢) (|). A projective measurement is the special case of a POVM where the 
measurement elements E; are projectors." 

There are situations where a POVM can do things a projective measurement cannot do.” For 


“Note that if E; is a projector, then Tr(E;|¢)(¢|) = Tr(E?|¢)(¢|) = Tr(Ei|¢)(¢|Ei) = ||E:\¢)||?, using the fact 
that E; = E? and the cyclic property of the trace. These inequalities can fail if Æ; is psd but not a projector. 

°Even though POVMs strictly generalize projective measurements, one can show that every POVM can be “sim- 
ulated” by a projective measurement on a slightly larger space that yields the exact same probability distribution 
over measurement outcomes (this follows from Neumark’s theorem). 


example, suppose you have a state in a 2-dimensional space, and you know it is either in state |0) 
or in state |+) = 3 (0) + |1)). These two states are not orthogonal, so there is no measurement 
that distinguishes them perfectly. However, there is a POVM measurement that never makes a 
mistake, but sometimes gives another outcome 2, meaning “I don’t know.” That is, you would like 
to do a measurement with three possible outcome: 0, 1, and 2, such that: 


e If the state is |0), then you get correct outcome 0 with probability 1/4, and outcome 2 with 
probability 3/4, but never get incorrect outcome 1. 


e If the state is |+), then you get correct outcome 1 with probability 1/4, and outcome 2 with 
probability 3/4, but never get incorrect outcome 0. 


You cannot achieve this with a projective measurement on the qubit, but the following 3-outcome 
POVM does the job: 


Eo = 3|—)(—| (where |—) = 35 (|0) —|1)), which is orthogonal to the |+) state); 


E, = 4|1)(1| (note that this is orthogonal to the |0) state); 
By = I — Eq — Fj. 


You can check that Eo, E1, E2 are psd and add up to identity, so they form a valid POVM. None of 
the 3 matrices is a projector. The success probability 1/4 can be improved further, see Exercise 9. 


1.2.3 Unitary evolution 


Instead of measuring |¢), we can also apply some operation to it, i.e., change the state to some 


lb) = bol0) + 61l1) +- + Bn—-i|N — 1). 


Quantum mechanics only allows linear operations to be applied to quantum states. What this 
means is: if we view a state like |¢) as an N-dimensional vector (@o,...,œy—1)”, then applying an 
operation that changes |¢) to |w) corresponds to multiplying |¢) with an N x N complex-valued 
matrix U: 
ag Bo 
U ; = 
QN-1 BN-1 


Note that by linearity we have |) = U|¢) = U (90, ailt)) = 90, aU li). 

Because measuring |y) should also give a probability distribution, we have the constraint 
pets |3;|? = 1 on the new state. This implies that the operation U must preserve the norm 
of vectors, and hence must be a unitary transformation (often just called “a unitary”). A matrix 
U is unitary if its inverse UT! equals its conjugate transpose U*. This is equivalent to saying that 
U always maps a vector of norm 1 to a vector of norm 1. Because a unitary transformation always 
has an inverse, it follows that any (non-measuring) operation on quantum states must be reversible: 
by applying U~! we can always “undo” the action of U, and nothing is lost in the process. On the 
other hand, a measurement is clearly non-reversible, because we cannot reconstruct |¢) from the 
observed classical state |j}. 


1.3 Qubits and quantum memory 


In classical computation, the unit of information is a bit, which can be 0 or 1. In quantum compu- 
tation, this unit is a quantum bit (qubit), which is a superposition of 0 and 1. Consider a system 
with 2 basis states, call them |0) and |1). We identify these basis states with the two orthogonal 


1 : ; ; i TE 
vectors ( 0 ) and ( : ), respectively. A single qubit can be in any superposition 


a|0) + a1|1), aol? + lar)? = 1. 


Accordingly, a single qubit “lives” in the vector space C?. 

Similarly we can think of systems of more than 1 qubit, which “live” in the tensor product 
space of several qubit systems. For instance, a 2-qubit system has 4 basis states: |0) @ |0}, |0) @|1), 
|1) @ |0}, |1) @ |1). Here for instance |1) $ |0} means that the first qubit is in its basis state |1) and 
the second qubit is in its basis state |0). We will often abbreviate this to |1)|0), |1,0), or even |10). 

More generally, a register of n qubits has 2” basis states, each of the form |b1) ® |bz2) @...@|bn), 
with b; € {0,1}. We can abbreviate this to |b1b2 .. . bn}. We will often abbreviate 0...0 to 0”. Since 
bitstrings of length n can be viewed as integers between 0 and 2” — 1 (see Appendix B.2), we can 
also write the basis states as numbers |0}, |1), |2), ..., |2” — 1). Note that the vector corresponding 
to n-qubit basis state |x) is the 2”-dimensional vector that has a 1 at the x-th position and 0s 
elsewhere (here we view «x as an integer in {0,...,2” — 1} and we count the positions in the vector 
starting from position 0). This implies that two n-qubit basis states |”) and |y) are orthogonal iff 
x Æ y. A different way to see this orthogonality is to use the rules of tensor product (Appendix A.6): 


(tly) = (zily) @ +++ @ (Erlyn) = (x1ly1) +++ (Zn|Yn)- 


Since (£k|Yk) = xp, yp, We see that basis states |x) and |y) will be orthogonal as soon as there is at 
least one position k at which the bits of x and y differ. 
A quantum register of n qubits can be in any superposition® 


Qn] 
aol0) + a11) + +++ + amal? -= 1), SO fay]? = 1. 
j=0 


Measuring this in the computational basis, we obtain the n-bit state |j} with probability |a,|?. 

Measuring just the first qubit of a state would correspond to the projective measurement that 
has the two projectors Py = |0)(0| @ Ign-1 and Py = |1)(1| @ Ign-1. For example, applying this 
measurement to the state z10)1) + ,/3{1)|w) gives outcome 0 with probability 1/3; the state then 
becomes |0)|¢). We get outcome 1 with probability 2/3; the state then becomes |1)|). Similarly, 
measuring the first n qubits of an (n + m)-qubit state in the computational basis corresponds to 
the projective measurement that has 2” projectors Pj = |j} (j| ® Iom for j € {0,1}”. 

An important property that deserves to be mentioned is entanglement, which refers to quantum 
correlations between different qubits. For instance, consider a 2-qubit register that is in the state 


l 100 la 
Va + ). 


Don’t call such a multi-qubit state or register a “qubit” or an “n-qubit”—the term “qubit” only refers to the 
state of a 2-dimensional system. You can use “n-qubit” as an adjective but not as a noun. 


T 


Such 2-qubit states are sometimes called EPR-pairs in honor of Einstein, Podolsky, and Rosen [106], 
who examined such states and their seemingly paradoxical properties. Initially neither of the two 
qubits has a classical value |0) or |1). However, if we measure the first qubit and observe, say, a 
|0}, then the whole state collapses to |00). Thus observing the first qubit immediately fixes also 
the second, unobserved qubit to a classical value. Since the two qubits that make up the register 
may be far apart, this example illustrates some of the non-local effects that quantum systems can 
exhibit. In general, a bipartite state |¢) is called entangled if it cannot be written as a tensor 
product |¢,4) Q |B) where |#,) lives in the first space and |#g) lives in the second.” 

At this point, a comparison with classical probability distributions may be helpful. Suppose 
we have two probability spaces, A and B, the first with 2” possible outcomes, the second with 2™ 
possible outcomes. A probability distribution on the first space can be described by 2” numbers 
(nonnegative reals summing to 1; actually there are only 2” — 1 degrees of freedom here) and a 
distribution on the second by 2” numbers. Accordingly, a product distribution on the joint space 
can be described by 2” + 2” numbers. However, an arbitrary (non-product) distribution on the 
joint space takes 2”*" real numbers, since there are 2”+™ possible outcomes in total. Analogously, 
an n-qubit state |¢,4) can be described by 2” numbers (complex numbers whose squared moduli 
sum to 1), an m-qubit state |g) by 2” numbers, and their tensor product |¢4) 8 |¢g) by 2” +2” 
numbers. However, an arbitrary (possibly entangled) state in the joint space takes 2”+™ numbers, 
since it lives in a 2”*™-dimensional space. We see that the number of parameters required to 
describe quantum states is the same as the number of parameters needed to describe probability 
distributions. Also note the analogy between statistical independence® of two random variables A 
and B and non-entanglement of the product state |4) ® |¢g). However, despite the similarities 
between probabilities and amplitudes, quantum states are much more powerful than distributions, 
because amplitudes may have negative (or even complex) parts which can lead to interference 
effects. Amplitudes only become probabilities when we square them. The art of quantum computing 
is to use these special properties for interesting computational purposes. 


1.4 Elementary gates 


A unitary that acts on a small number of qubits (say, at most 3) is often called a gate, in analogy 
to classical logic gates like AND, OR, and NOT; more about that in the next chapter. The Pauli 
matrices I, X, Y, Z (Appendix A.9) are examples of 1-qubit gates. For example, the bitflip gate X 
(a.k.a. NOT-gate) negates the bit in the computational basis, i.e., it swaps |0) and |1). The phaseflip 
gate Z puts a — in front of |1). Represented as 2 x 2 unitary matrices, these are 


a) 


"We often omit the tensor product symbol for such unentangled states, abbreviating |¢4) @|¢z) to |ġa)lġB) (you 
shouldn’t abbreviate this further to |/a¢s) though, unless both |¢4) and |¢g) are computational basis states). Note 
that there cannot be ambiguity between tensor product and the usual matrix product in this abbreviation, because 
both |¢a) and |¢g) are column vectors and hence their matrix product wouldn’t even be well-defined (the dimensions 
“don’t fit”). 

8Two random variables A and B are independent if their joint probability distribution can be written as a product 
of individual distributions for A and for B: Pr[A = a ^ B = b] = Pr[A = a] - Pr[B = b] for all possible values a, b. 


Another important 1-qubit gate is the phase gate Ry, which merely rotates the phase of the |1)-state 
by an angle ¢: 

Rg|0) = |0) 

Rell) = e'|1) 


1 0 


Note that Z is a special case of this: Z = R,, because e" = —1. The R, ja-gate is often just called 
the T-gate. 
Possibly the most important 1-qubit gate is the Hadamard transform, specified by: 


1 1 
H|0) = —=|0) + a 


T 
va 


This corresponds to the unitary matrix 


H|1) = Tg th 


As a unitary matrix, this is represented as 


a -1): 


If we apply H to initial state |0) and then measure, we have equal probability of observing |0) or 
|1). Similarly, applying H to |1) and observing gives equal probability of |0) or |1). However, if we 
apply H to the superposition z510) + zl!) then we obtain 


1 1 1 1 
—|0) + —~|1)) = — ae 
l0 + all) = a HO) + 
The positive and negative amplitudes for |1) have canceled each other out! This effect is called 
interference, and is analogous to interference patterns between light or sound waves. 


An example of a 2-qubit gate is the controlled-not gate CNOT. It negates the second bit of its 
input if the first bit is 1, and does nothing if the first bit is 0: 


CNOT|0)|6) = |0)|6) 
CNOT|1)|b) = |1)}1 — b) 


H( H|0) + HL) = Z (10) + 11)) + 5 (10) — [1)) = 10). 


The first qubit is called the control qubit, the second the target qubit. In matrix form, this is 


1 0 0 0 
0 1 0 0 
CNOT = 0001 
0 0 1 0 


More generally, if U is some n-qubit unitary matrix, then the controlled-U operation corresponds 
to the following 2"+! x 2”*! unitary matrix: 


I 0 
0 U)’ 
where I is the 2”-dimensional identity matrix and the two 0s denote 2” x 2” all-0 matrices. 
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1.5 Example: quantum teleportation 


In the next chapter we will look in more detail at how we can use and combine such elementary 
gates, but as an example we will here already explain teleportation [48]. Suppose there are two 
parties, Alice and Bob. Alice has a qubit ao|0) +a4|1) that she wants to send to Bob via a classical 
channel. Without further resources this would be impossible, because the amplitudes ag,a, may 
require an infinite number of bits of precision to write them down exactly. However, suppose Alice 
also shares an EPR-pair i 
V2 


with Bob (say Alice holds the first qubit and Bob the second). Initially, their joint state is 


(100) + |11)) 


(ao]0) + a11) 8 <z (100) +111). 


V2 


The first two qubits belong to Alice, the third to Bob. Alice performs a CNOT on her two qubits 
and then a Hadamard transform on her first qubit. Their joint 3-qubit state can now be written as 


3 |00)(ao|0) + @1|1)) + 
T |01)(ao|1) + a1|0)) + 
auoe =a) + 


z |11) (aol1) — a1|0) 
sA a ama 
Alice Bob 
Alice then measures her two qubits in the computational basis and sends the result (2 random 
classical bits ab) to Bob over a classical channel. Bob now knows which transformation he must 
do on his qubit in order to regain the qubit aœo|0} + a1|1}. First, if b = 1 then he applies a bitflip 
(X-gate) on his qubit; second if a = 1 then he applies a phaseflip (Z-gate). For instance, if Alice 
sent ab = 11, then Bob knows that his qubit is ag|1) — ay|0). A bitflip followed by a phaseflip 
will give him Alice’s original qubit ag|0) + a;|1). In fact, if Alice’s qubit had been entangled with 
some other qubits, then teleportation preserves this entanglement: Bob then receives a qubit that 
is entangled in the same way as Alice’s original qubit was. 
Note that the qubit on Alice’s side has been destroyed: teleporting moves a qubit from Alice to 
Bob, rather than copying it. In fact, copying an unknown qubit is impossible [249], see Exercise 10. 


Exercises 


1. (a) What is the inner product between the real vectors (0,1,0,1) and (0,1,1,1)? 
(b) What is the inner product between the states |0101) and |0111)? 


2. Compute the result of applying a Hadamard transform to both qubits of |0) 8 |1} in two ways 
(the first way using tensor product of vectors, the second using tensor product of matrices), 
and show that the two results are equal: 


H|0) ® H|1) = (H & H)((0) @ |1)). 


3. Show that a bitflip operation, preceded and followed by Hadamard transforms, equals a 
phaseflip operation: HXH = Z. 
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10. 


11. 


12. 


13. 


. Show that surrounding a CNOT gate with Hadamard gates switches the role of the control-bit 


and target-bit of the CNOT: (H @ H)CNOT(H & H) is the 2-qubit gate where the second 
bit controls whether the first bit is negated (i.e., flipped). 


. Simplify the following: ((0| & I)(ag9|00) + ao1|01) + 49/10) + a4;|11)). 


. Prove that an EPR-pair Z (|00) + |11)) is an entangled state, i.e., that it cannot be written 


as the tensor product of two separate qubits. 


. Suppose we have the state z (1014) + |1)|w)), where |} and |y) are unknown normalized 


quantum states with the same number of qubits. Suppose we apply a Hadamard gate to the 
first qubit and then measure that first qubit in the computational basis. Give the probability 
of measurement outcome 1, as a function of the states |) and |). 


. Give the 2-outcome projective measurement on a 2-qubit space that measures the parity (i.e., 


sum modulo 2) of 2-bit basis states. Also give the corresponding observable. 


. (H) Show that the success probability of the POVM at the end of Section 1.2.2 can be 


increased from 1/4 to 1/(2+ V2). 


(H) Prove the quantum no-cloning theorem: there does not exist a 2-qubit unitary U that 
maps 


|d)|0) ++ |¢)|¢) 
for every qubit |¢). 


Show that unitaries cannot “delete” information: there is no 1-qubit unitary U that maps 
|p) ++ |0) for every 1-qubit state |¢@). 


Suppose Alice and Bob are not entangled. If Alice sends a qubit to Bob, then this can 
give Bob at most one bit of information about Alice.? However, if they share an EPR-pair, 
lY) = 3 (|00) + |11)), then they can transmit two classical bits by sending one qubit over the 
channel; this is called superdense coding. This exercise will show how this works. 


(a) They start with a shared EPR-pair, 5 (|00) + |11)). Alice has classical bits a and b. 
Suppose she does an X-gate on her half of the EPR-pair if a = 1, followed by a Z-gate 
if b = 1 (she does both if ab = 11, and neither if ab = 00). Write the resulting 2-qubit 
state for the four different cases that ab could take. 


(b) Suppose Alice sends her half of the state to Bob, who now has two qubits. Show that 
Bob can determine both a and b from his state, using Hadamard and CNOT gates, 
followed by a measurement in the computational basis. 


Alice and Bob share an EPR-pair, |Y) = = (00) + |11)). 


(a) Let C be a 2 x 2 matrix. Show that Tr((C ® I) |) (|) = §Tr(C). 


(b) (H) Alice could apply one of the 4 Pauli matrices (I, X,Y, Z) to her qubit. Use part (a) 
to show that the 4 resulting 2-qubit states form an orthonormal set. 


° This is actually a deep statement, a special case of Holevo’s theorem. More about this may be found in Chapter 15. 
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(c) Suppose Alice applies one of the 4 Pauli matrices to her qubit and then sends that qubit 
to Bob. Give the 4 projectors of a 4-outcome projective measurement that Bob could 
do on his 2 qubits to find out which Pauli matrix Alice actually applied. 


cos? —sin@ 


14. Let 6 € [0,27), Ug = ( sind cos 


), 16) = Upl0) and |) = Upl1). 


(a) Show that ZX|¢+) = |¢). 
(b) Show that an EPR-pair, 3 (|00) + |11)), can also be written as a (¢)19) + |¢+)|¢+)). 


(c) Suppose Alice and Bob start with an EPR-pair. Alice applies Ug 1 to her qubit and then 
measures it in the computational basis. What pure state does Bob have if her outcome 
was 0, and what pure state does he have if her outcome was 1? 


(d) Suppose Alice knows the number 6 but Bob does not. Give a protocol that uses one 
EPR-pair and 1 classical bit of communication where Bob ends up with the qubit |) 
(in contrast to general teleportation of an unknown qubit, which uses 1 EPR-pair and 2 
bits of communication). 
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Chapter 2 


The Circuit Model and the 
Deutsch-Jozsa Algorithm 


2.1 Quantum computation 


Below we explain how a quantum computer can apply computational steps to its register of qubits. 
Two models exist for this: the quantum Turing machine [97, 53] and the quantum circuit model [98, 
251]. These models are equivalent, in the sense that they can simulate each other in polynomial 
time, assuming the circuits are appropriately “uniform.” We only explain the circuit model here, 
which is more popular among researchers. 


2.1.1 Classical circuits 


In classical complexity theory, a Boolean circuit is a finite directed acyclic graph with AND, OR, 
and NOT gates. It has n input nodes, which contain the n input bits (n > 0). The internal 
nodes are AND, OR, and NOT gates, and there are one or more designated output nodes. The 
initial input bits are fed into AND, OR, and NOT gates according to the circuit, and eventually 
the output nodes assume some value. We say that a circuit computes some Boolean function 
f : {0,1}" — {0,1} if the output nodes get the right value f(x) for every input x € {0,1}”. 

A circuit family is a set C = {Cn} of circuits, one for each input size n. Each circuit has one 
output bit. Such a family recognizes or decides a language L C {0,1}* = Unso{0, 1}” if, for every 
n and every input x € {0,1}"”, the circuit Cn outputs 1 if x € L and outputs 0 otherwise.! Such a 
circuit family is uniformly polynomial if there is a deterministic Turing machine that outputs Cn 
given n as input, using space logarithmic in n.? Note that the size (number of gates) of the circuits 
Cn can then grow at most polynomially with n. It is known that uniformly polynomial circuit 
families are equal in power to polynomial-time deterministic Turing machines: a language L can 
be decided by a uniformly polynomial circuit family iff L € P [198, Theorem 11.5], where P is the 
class of languages decidable by polynomial-time Turing machines. 

Similarly we can consider randomized circuits. These receive, in addition to the n input bits, 
also some random bits (“coin flips”) as input. A randomized circuit computes a function f if it 


‘We can think of a language L as a sequence of Boolean functions fn : {0,1}" — {0,1}, where fn takes value 1 
exactly on the n-bit strings that are in L. The circuit Cn then computes the function fn. 

?Logarithmic space implies time that’s at most polynomial in n, because such a machine will have only poly(n) 
different internal states, so it either halts after poly(n) steps or cycles forever. 
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successfully outputs the right answer f(x) with probability at least 2/3 for every x (probability 
taken over the values of the random bits). Randomized circuits are equal in power to randomized 
Turing machines: a language L can be decided by a uniformly polynomial randomized circuit 
family iff L € BPP, where BPP (“Bounded-error Probabilistic Polynomial time”) is the class of 
languages that can efficiently be recognized by randomized Turing machines with success probability 
at least 2/3. Because we can efficiently reduce the error probability of randomzied algorithms (see 
Appendix B.2), the particular value 2/3 doesn’t really matter here and may be replaced by any 
fixed constant in (1/2, 1). 


2.1.2 Quantum circuits 


A quantum circuit (also called quantum network or quantum gate array) generalizes the idea of 
classical circuit families, replacing the AND, OR, and NOT gates by elementary quantum gates. A 
quantum gate is a unitary transformation on a small (usually 1, 2, or 3) number of qubits. We saw 
a number of examples already in the previous chapter: the bitflip gate X, the phaseflip gate Z, 
the Hadamard gate H. The main 2-qubit gate we have seen is the controlled-NOT (CNOT) gate. 
Adding another control qubit, we get the 3-qubit Toffoli gate, also called controlled-controlled-not 
(CCNOT) gate. This negates the third bit of its input if both of the first two bits are 1. The 
Toffoli gate is important because it is complete for classical reversible computation: any classical 
computation can be implemented by a circuit of Toffoli gates. This is easy to see: using auxiliary 
wires with fixed values, Toffoli can implement AND (fix the 3rd ingoing wire to 0) and NOT (fix the 
1st and 2nd ingoing wire to 1). It is known that AND and NOT-gates together suffice to implement 
any classical Boolean circuit, so if we can apply (or simulate) Toffoli gates, we can implement any 
classical computation in a reversible manner. 

Mathematically, such elementary quantum gates can be composed into bigger unitary operations 
by taking tensor products (if gates are applied in parallel to different parts of the register), and 
ordinary matrix products (if gates are applied sequentially). We have already seen a simple example 
of such a circuit of elementary gates in the previous chapter, namely to implement teleportation. 

For example, if we apply the Hadamard gate H to each bit in a register of n zeroes, we obtain 


= 5 li); 


JE{O,1}” 


which is a superposition of all n-bit strings. More generally, if we apply H®" to an initial state |i), 
with i € {0,1}”", we obtain 


Heni) -y 5S nil), (2.1) 


jE{0,1}” 


where i - j = X /_, ixjg denotes the inner product of the n-bit strings i, j € {0,1}". For example: 


1 
yale +11) 8 


H®01) = 00-0); E h) 


v2 jE{0,1}? 


Note that Hadamard happens to be its own inverse (it’s unitary and Hermitian, hence H = H* = 
H-t), so applying it once more on the right-hand side of the above equation would give us back 
|01). The n-fold Hadamard transform will be very useful for quantum algorithms. 
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As in the classical case, a quantum circuit is a finite directed acyclic graph of input nodes, 
gates, and output nodes. There are n nodes that contain the input (as classical bits); in addition 
we may have some more input nodes that are initially |0) (“workspace”). The internal nodes of the 
quantum circuit are quantum gates that each operate on at most two or three qubits of the state. 
The gates in the circuit transform the initial state vector into a final state, which will generally 
be a superposition. We measure some or all qubits of this final state in the computational basis 
in order to (probabilistically) obtain a classical output to the algorithm. We can think of the 
measurement of one qubit in the computational basis as one special type of gate. We may assume 
without much loss of generality that such measurements only happen at the very end of the circuit 
(see Exercise 7). 

What about the more general kinds of measurements discussed in Section 1.2.2? If we want 
to apply such a measurement in the circuit model, we will have to implement it using a circuit 
of elementary gates followed by a measurement in the computational basis. For example, suppose 
projectors P) and P, form a 2-outcome projective measurement on an n-qubit space (Po +P = Ign). 
Assume for simplicity that Pp and Pı both have rank 2"/2. Then there exists a unitary U that 
maps an n-qubit state |¢) to a state whose first qubit is |0) whenever Po|¢) = |¢), and that maps 
n-qubit |Y) to a state whose first qubit is |1) whenever Pily) = |W). We can now implement the 
projective measurement by first applying a circuit that implements U, and then measuring (in the 
computational basis) the first qubit of the resulting state. The minimal-size circuit to implement 
U could be very large (i.e., expensive) if the projective measurement is complicated, but that is 
how it should be. 

To draw quantum circuits, the convention is to let time progress from left to right: we start with 
the initial state on the left. Each qubit is pictured as a horizontal wire, and the circuit prescribes 
which gates are to be applied to which wires. Single-qubit gates like X and H just act on one 
wire, while multi-qubit gates such as the CNOT act on multiple wires simultaneously. When one 
qubit “controls” the application of a gate to another qubit, then the controlling wire is drawn with 
a dot linked vertically to the gate that is applied to the target qubit. This happens for instance 
with the CNOT, where the applied single-qubit gate is X, usually drawn as ‘@’ in a circuit picture 
(similarly, the Toffoli gate is drawn in a circuit with a dot on the two control wires and an ‘@’ on 
the target wire). Figure 2.1 gives a simple example on two qubits, initially in basis state |00): first 
apply H to the 1st qubit, then CNOT to both qubits (with the first qubit acting as the control), 
and then Z to the last qubit. The resulting state is 5 (|00) —|11)). 


|0) H I 
|0) Z 


WD 


Figure 2.1: Simple circuit for turning |00) into an entangled state 


Note that if we have a circuit for unitary U, it is very easy to find a circuit for the inverse UT! 
with the same complexity: just reverse the order of the gates, and take the inverse of each gate. 
For example, if U = U1U2U3, then U-! = Uz Uz 'U7 +. 


3Note that the number of wires (qubits) going into a unitary must equal the number of wires going out because 
a unitary is always invertible (reversible). This differs from the case of classical circuits, where non-reversible gates 
like AND have more wires going in than out. 
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In analogy to the classical class BPP, we will define BQP (“Bounded-error Quantum Poly- 
nomial time”) as the class of languages that can efficiently be computed with success probability 
at least 2/3 by (a family of) quantum circuits whose size grows at most polynomially with the 
input length. We will study this quantum complexity class and its relation with various classical 
complexity classes in more detail in Chapter 13. 


2.2 Universality of various sets of elementary gates 


Which set of elementary gates should we allow? There are several reasonable choices. 


(1) The set of all 1-qubit operations together with the 2-qubit CNOT gate is universal, 
meaning that any other unitary transformation can be built from these gates. 


Allowing all 1-qubit gates is not very realistic from an implementational point of view, as there are 
continuously many of them, and we cannot expect experimentalists to implement gates to infinite 
precision. However, the model is usually restricted, only allowing a small finite set of 1-qubit gates 
from which all other 1-qubit gates can be efficiently approximated. 


(2) The set consisting of CNOT, Hadamard, and the phase-gate T = R, /4 is universal 
in the sense of approximation, meaning that any other unitary can be arbitrarily well 
approximated using circuits of only these gates. The Solovay-Kitaev theorem [196, 
Appendix 3] says that this approximation is quite efficient: we can approximate any 
gate on 1 or 2 qubits up to error £ using a number of gates (from our small set) that 
is only polylog(1/e), i.e., polynomial in the logarithm of 1/e; in particular, simulating 
arbitrary gates up to exponentially small error costs only a polynomial overhead. 


It is often convenient to restrict to real numbers and use an even smaller set of gates: 


(3) The set of Hadamard and Toffoli (CCNOT) is universal for all unitaries with real 
entries in the sense of approximation, meaning that any unitary with only real entries 
can be arbitrarily well approximated using circuits of only these gates. 


2.3 Quantum parallelism 


One uniquely quantum-mechanical effect that we can use for building quantum algorithms is quan- 
tum parallelism. Suppose we have a classical algorithm that computes some function f : {0,1}" > 
{0,1}. Then we can build a quantum circuit U (consisting only of Toffoli gates) that maps 
|210} > |z)|f(z)) for every z € {0,1}”. Now suppose we apply U to a superposition of all inputs z 
(which is easy to build using n Hadamard transforms): 


1 1 
U ae > 1210) = ae 5 lef). 


z€{0,1}” z€{0,1}” 


We applied U just once, but the final superposition contains f(z) for all 2” input values z! However, 
by itself this is not very useful and does not give more than classical randomization, since observing 
the final superposition will give just one uniformly random |z)|f(z)) and all other information will 
be lost. As we will see below, quantum parallelism needs to be combined with the effects of 
interference and entanglement in order to get something that is better than classical. 
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2.4 The early algorithms 


The two best-known successes of quantum algorithms so far are Shor’s factoring algorithm from 
1994 [228] and Grover’s search algorithm from 1996 [125], which will be explained in later chapters. 
Here we describe some of the earlier quantum algorithms that preceded Shor’s and Grover’s. 

Virtually all quantum algorithms work with queries in some form or other. We will explain 
this model here. It may look contrived at first, but eventually will lead smoothly to Shor’s and 
Grover’s algorithm. We should, however, emphasize that the query complexity model differs from 
the standard model described above, because the input is now given as a “black-box” (also some- 
times called an “oracle”). This means that the exponential quantum-classical separations that we 
describe below do not by themselves give exponential quantum-classical separations in the standard 
circuit model (the same applies to Simon’s algorithm in the next chapter). 

To explain the query setting, consider an N-bit input x = (xo, ...,&Ẹn—1) € {0,1}%. Usually we 
will have N = 2”, so that we can address bit x; using an n-bit index 7. One can think of the input 
as an N-bit memory which we can access at any point of our choice (a “Random Access Memory” 
or RAM). For example, a memory of N = 1024 bits can be indexed by addresses i € {0,119 of 
n = 10 bits each. A memory access is via a so-called “black-box,” which is equipped to output the 
bit x; on input i. As a quantum operation, this is the following unitary mapping on n + 1 qubits: 


Oz : i, 0} — |i, £i). 


The first n qubits of the state are called the address bits (or address register), while the (n + 1)st 
qubit is called the target bit. Since this mapping must be unitary, we also have to specify what 
happens if the initial value of the target bit is 1. Therefore we actually let Oz be the following 
unitary transformation: 


Ox : |i, b} > li, b ® zi), 


here 7 € {0,1}”, b € {0,1}, and @ denotes exclusive-or (addition modulo 2). In matrix representa- 
tion, this O, is now a permutation matrix and hence unitary. Note that a quantum computer can 
apply Ox on a superposition of various i, something a classical computer cannot do. One applica- 
tion of this black-box is called a query, and counting the required number of queries to compute 
this or that function of x is something we will do a lot in the first half of these notes. 

Given the ability to make a query of the above type, we can also make a query of the form 
ji) +» (—1)**|z) by setting the target bit to the state |—) = va (10) —|1)) = Al): 


; gl Jj 

Oz (li)}|—)) = i aI — |1 — z:)) = (—1)™|#)|-). 
This +-kind of query puts the output variable in the phase of the state: if x; is 1 then we get 
a —1 in the phase of basis state |i); if x; = 0 then nothing happens to |i).° This “phase-query” 
or “phase-oracle” is sometimes more convenient than the standard type of query. We denote the 
corresponding n-qubit unitary transformation by O;,+. 


‘It is a common rookie mistake to confuse the N bits of x with the n address bits; don’t fall for this! 

5This is sometimes called the “phase kick-back trick.” Note that for |+) = 35 (0) +|1)), we have Oz (|¢)|+)) = 
|t)|+) irrespective of what x is. This allows us to control on which part of the state a phase-query is applied: we put 
the control qubit in state |—) for indices i where we want to apply the phase-query, and in state |+) for the indices 
where we do to not want to apply a phase-query. 
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2.4.1 Deutsch-Jozsa 


Deutsch-Jozsa problem [99]: 

For N = 2”, we are given x € {0,1}% such that either 

(1) all x; have the same value (“constant”), or 

(2) N/2 of the x; are 0 and N/2 are 1 (“balanced”). 

The goal is to find out whether x is constant or balanced. 


The algorithm of Deutsch and Jozsa is as follows. We start in the n-qubit zero state |0”), apply 
a Hadamard transform to each qubit, apply a query (in its +-form), apply another Hadamard to 
each qubit, and then measure the final state. As a unitary transformation, the algorithm would be 
H®"0,4H®". We have drawn the corresponding quantum circuit in Figure 2.2 (where time again 
progresses from left to right). Note that the number of wires going into the query is n, not N; the 
basis states on this sequence of wires specify an n-bit address. 


0) H H 
0) H Oxr,+ H measure 
0) H H 


Figure 2.2: The Deutsch-Jozsa algorithm for n = 3 


Let us follow the state through these operations. Initially we have the state |0”). By Equa- 
tion (2.1) on page 14, after the first Hadamard transforms we have obtained the uniform superpo- 
sition of all i: 


The O;,4-query turns this into 


se UH. 


i€{0,1}” 
Applying the second batch of Hadamards gives (again by Equation (2.1)) the final superposition 
1 ; bei 
Ly cy E Evy, 
iE {0,1}” JE{O,1}” 


where i- j = J}; ikjk as before. Since i- 0” = 0 for all i € {0,1}”, we see that the amplitude of 
the |0")-state in the final superposition is 


1 1 if x; =0 for all 2, 
— X (-)%=4 -1 ifa; =1 for all i, 
i€{0,1}” 0 if x is balanced. 


Hence the final observation will yield |0") if x is constant and will yield some other state if x 
is balanced. Accordingly, the Deutsch-Jozsa problem can be solved with certainty using only 1 
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quantum query and O(n) other operations (the original solution of Deutsch and Jozsa used 2 
queries, the 1-query solution is from [91}). 

In contrast, it is easy to see that any classical deterministic algorithm needs at least N/2 +1 
queries: if it has made only N/2 queries and seen only Os, the correct output is still undetermined. 
However, a classical algorithm can solve this problem efficiently if we allow a small error probability: 
just query x at two random positions, output “constant” if those bits are the same and “balanced” 
if they are different. This algorithm outputs the correct answer with probability 1 if x is constant 
and outputs the correct answer with probability 1/2 if x is balanced. Thus the quantum-classical 
separation of this problem only holds if we consider algorithms without error probability. 


2.4.2 Bernstein-Vazirani 


Bernstein-Vazirani problem [53]: 
For N = 2”, we are given x € {0,1} with the property that there is some unknown a € {0,1}” 
such that x; = (i-a) mod 2. The goal is to find a. 


The Bernstein-Vazirani algorithm is exactly the same as the Deutsch-Jozsa algorithm, but now 
the final observation miraculously yields a. Since (—1)** = (—1)%® med? — (—1)"*, we can write 
the state obtained after the query as: 


1 

s (—1)**|2) i yi LP" la). 

POE PE 

Since Hadamard is its own inverse, from Equation (2.1) we can see that applying a Hadamard to 

each qubit of the above state will turn it into the classical state |a). This solves the Bernstein- 

Vazirani problem with 1 query and O(n) other operations. In contrast, any classical algorithm 

(even a randomized one with small error probability) needs to ask n queries for information-theoretic 

reasons: the final answer consists of n bits and one classical query gives at most 1 bit of information. 
Bernstein and Vazirani also defined a recursive version of this problem, which can be solved 

exactly by a quantum algorithm in poly(n) steps, but for which every classical randomized algorithm 


needs n@los”) steps. 


Exercises 


1. Is the controlled-NOT operation C Hermitian? Determine C~!. 


2. Construct a CNOT from two Hadamard gates and one controlled-Z (the controlled-Z gate 
maps |11) +» —|11) and acts like the identity on the other basis states). 


3. A SWAP-gate interchanges two qubits: it maps basis state |a, b) to |b, a). Implement a SWAP- 
gate using a few CNOTs (when using a CNOT, you're allowed to use either of the 2 bits as 
the control, but be explicit about this). 


4. Show that every 1-qubit unitary with real entries can be written as a rotation matrix, possibly 
preceded and followed by Z-gates. In other words, show that for every 2 x 2 real unitary U, 
there exist signs s1, 52,53 E {1,—1} and angle 0 € [0, 277) such that 


i een meen Cael 
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5. Let U be a 1-qubit unitary that we would like to implement in a controlled way, i.e., we want 
to implement a map |c)|b) 4 |c)U°|b) for all c,b € {0,1} (here U? = I and Ut =U). One can 
show there exist 1-qubit unitaries A, B, and C, such that ABC = I and AXBXC =U (X 
is the NOT-gate); you may assume this without proof. Give a circuit that acts on two qubits 
and implements a controlled-U gate, using CNOTs and (uncontrolled) A, B, and C gates. 


6. (H) Let C be a given quantum circuit consisting of T many gates, which may be CNOTs and 
single-qubit gates. Show that we can implement C in a controlled way using O(T) Toffoli 
gates, CNOTs and single-qubit gates, and no auxiliary qubits other than the controlling qubit. 


7. (H) It is possible to avoid doing any intermediate measurements in a quantum circuit, using 
one auxiliary qubit for each 1-qubit measurement that needs to be delayed until the end of 
the computation. Show how. 


8. (a) Give a circuit that maps |0",b) +> |0”,1 — b) for b € {0,1}, and that maps |i,b) > 
ji, b) whenever i € {0,1}"\{0"}. You are allowed to use every type of elementary gate 
mentioned in the lecture notes (incl. Toffoli gates), as well as auxiliary qubits that are 
initially |0} and that should be put back to |0) at the end of the computation. 

(b) Suppose we can make queries of the type |i,b) +> |i,b @ 2;) to input z € {0,1}, with 
N = 2”. Let 2’ be the input x with its first bit flipped (e.g., if z = 0110 then 2’ = 1110). 
Give a circuit that implements a query to z’. Your circuit may use one query to 2. 


(c) Give a circuit that implements a query to an input x” that is obtained from x (analo- 
gously to (b)) by setting its first bit to 0. Your circuit may use one query to zx. 


9. In Section 2.4 we showed that a standard query, which maps |i,b) +> |i,b @ xi) (where 
i € {0,...,N — 1} and b € {0,1}), can be used to implement a phase-query to 2, i.e., one of 
the type |i) + (—1)**|2) (this is an uncontrolled phase-query). 


(a) Show that a standard query can be implemented using one controlled phase-query to x 
(which maps |c, i} +> (—1)°|c, i), so the phase is added only if the control bit is c = 1), 
and possibly some auxiliary qubits and other gates. 

(b) Can you also implement a standard query using one or more uncontrolled phase-queries 
to x, and possibly some auxiliary qubits and other gates? If yes, show how. If no, prove 
why not. 


10. Suppose we have a 2-bit input x = x9x; and a phase-query that maps 


Oz : |b) + (—1)”*|b) for b € {0,1}. 


(a) Suppose we run the 1-qubit circuit HO,+H on initial state |0) and then measure (in 
the computational basis). What is the probability distribution on the output bit, as a 
function of x? 


(b) Now suppose the query leaves some workspace in a second qubit, which is initially |0): 


Oy + : |b,0) + (—1)"*|b, 6) for b€ {0, 1}. 


Suppose we just ignore the workspace and run the algorithm of (a) on the first qubit 
with O} + instead of Oxs, + (and H & J instead of H, and initial state |00)). What is now 
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11. 


12. 


13. 


the probability distribution on the output bit (i.e., if we measure the first of the two 
bits)? 

Comment: This exercise illustrates why it’s important to “clean up” (i.e., set back to |0)) workspace 
qubits of some subroutine before running it on a superposition of inputs: the unintended entanglement 


between the address and workspace registers can thwart the intended interference effects. 


Give a randomized classical algorithm (i.e., one that can flip coins during its operation) that 
makes only two queries to x, and decides the Deutsch-Jozsa problem with success probability 
at least 2/3 on every possible input. A high-level description is enough, no need to write out 
the classical circuit. 


Suppose our N-bit input x satisfies the following promise: 

either (1) the first N/2 bits of x are all 0 and the second N/2 bits are all 1; or (2) the number 
of 1s in the first half of x plus the number of Os in the second half, equals N/2. Modify the 
Deutsch-Jozsa algorithm to efficiently distinguish these two cases (1) and (2). 


(H) Let N = 2”. A parity query to input x € {0,1}% corresponds to the (N + 1)-qubit 
unitary map Qz : |y, b) + |y,b@(x-y)), where z- y = ya ziyi mod 2. For a fixed function 
f : {0,1}% > {0,1}, give a quantum algorithm that computes f(a) using only one such query 
(i.e., one application of Qz), and as many elementary gates as you want. You do not need to 
give the circuit in full detail, an informal description of the algorithm is good enough. 
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Chapter 3 


Simon’s Algorithm 


The Deutsch-Jozsa problem showed an exponential quantum improvement over the best determin- 
istic classical algorithms; the Bernstein-Vazirani problem showed a polynomial improvement over 
the best randomized classical algorithms that have error probability < 1/3. In this chapter we will 
combine these two features: we will see a computational problem due to Simon [230] where quantum 
computers are provably exponentially more efficient (in terms of number of queries) than bounded- 
error randomized algorithms. Simon’s problem may look rather contrived at first sight, but his 
quantum algorithm to solve it was the main inspiration for Shor’s important quantum algorithm 
for the very natural problem of integer factorization, which we will see in Chapter 5. More recently, 
Simon’s algorithm itself was also used to break some classical cryptographic systems [147, 217]. 


3.1 The problem 


Let N = 2”, and identify the set {0,..., N — 1} with {0,1}”". Let j @s be the n-bit string obtained 
by bitwise adding the n-bit strings j and s mod 2, so for example 00110 6 10101 = 10011. 


Simon’s problem [230]: 
For N = 2”, we are given x = (%,...,@N-1), with x; € {0,1}", with the property that there is 
some unknown nonzero s € {0,1}" such that x; = z; iff (i = j or i = j @s). The goal is to find s. 
Note that x, viewed as a function from {0,..., N — 1} to {0,...,N — 1}, is a 2-to-1 function, 
where the 2-to-1-ness is determined by the unknown mask s. The queries to the input here are 
slightly different from before: the input x = (xo,...,@N—1) now has variables x; that themselves 
are n-bit strings, and one query gives such a string completely (|i, 0”) +> |i, x:}). However, we can 
also view this problem as having n2” binary variables that we can query individually. Since we can 
simulate one x;-query using only n binary queries (just query all n bits of x;), this alternative view 
will not affect the number of queries very much. 


3.2 The quantum algorithm 


Simon’s algorithm starts out very similar to Deutsch-Jozsa: start in a state of 2n zero qubits |0")|0”) 
and apply Hadamard transforms to the first n qubits to put them in a uniform superposition, giving 


= 5 1910”). 


i€{0,1}” 
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At this point, the second n-qubit register still holds only zeroes. A query turns this into 


y È le 


i€{0,1}” 


Now the algorithm measures the second n-qubit register in the computational basis (see Exercise 1); 
this measurement is actually not necessary, but it facilitates analysis. The measurement outcome 
will be some value x; and the first register will collapse to the superposition of the two indices 


having that x;-value: 
1 
—=(|t) + |i ® s))|a;). 
Salli) +H @ sle) 
We will now ignore the second register and apply Hadamard transforms to the first n qubits. Using 
Equation (2.1) and the fact that (i B s)-7j = (i- j) ® (s- j), we can write the resulting state as 


L E yt + E Cera) | = 


JE{O,1}” JE{O,1}” 


1 
gn+1 


ey ay 

JE{0,1}" 

Note that |j) has nonzero amplitude iff s- j = 0 mod 2. Measuring the state gives a uniformly 
random element from the set {j | s-7 =0 mod 2}. Accordingly, we get a linear equation that gives 
information about s. We repeat this algorithm until we have obtained n — 1 independent linear 
equations involving s. The solutions to these equations will be 0” and the correct s, which we can 
compute efficiently by a classical algorithm (Gaussian elimination modulo 2). This can be done by 
means of a classical circuit of size roughly O(n°). 

Note that if the 7’s you have generated at some point span a space of size 2", for some k < n—1, 
then the probability that your next run of the algorithm produces a j that is linearly independent 
of the earlier ones, is (2"~! — 2*)/2"-! > 1/2. Hence an expected number of O(n) runs of the 
algorithm suffices to find n — 1 linearly independent j’s. Simon’s algorithm thus finds s using an 
expected number of O(n) x;-queries and polynomially many other operations. 


3.3 Classical algorithms for Simon’s problem 


3.3.1 Upper bound 


Let us first sketch a classical randomized algorithm that solves Simon’s problem using O(/2”) 
queries. The algorithm is based on the so-called “birthday paradox,” which is the phenomenon 
that in a group of only 23 people, there is already a large probability that two people share the 
same birthday, despite the fact that the number of possible birthdays (365) is much larger than the 
number of people (23). The intuitive explanation is that the number of pairs of people is actually 
quadratic in the number of people, and each pair has a 1/365 probability to have the same birthday 
(assuming birthdays are distributed uniformly random among people). Of course, different pairs 
may overlap and hence are not independent, but the idea still works. 

Our algorithm will make T randomly chosen distinct queries 71,...,i7, for some T to be 
determined later. If there is a collision among those queries (i.e., £i, = £i, for some k # £, so iz 
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and ip happen to have the same “birthday” ), then we are done, because then we know iz = ip © s, 
equivalently s = ik @ig. There won’t be any collisions if s = 0”, but how large should T be such 
that we are likely to see a collision in case s # 0”? There are (2) = 4T(T — 1) ~ T?/2 pairs in 
our sequence that could be a collision, and since the indices are chosen randomly, the probability 
for a fixed pair to form a collision is 1/(2” — 1). Hence by linearity of expectation, the expected 
number of collisions in our sequence will be roughly T?/2"+1. If we choose T = V2"+!, we expect 
to have roughly 1 collision in our sequence, which is good enough to find s. Of course, an expected 
value of 1 collision does not mean that we will have at least one collision with high probability, but 


a slightly more involved calculation shows the latter statement as well. 


3.3.2 Lower bound 


Simon [230] proved that any classical randomized algorithm that finds s with high probability needs 
to make 2(/2”) queries, so the above classical algorithm is essentially optimal. This was the first 
proven exponential separation between quantum algorithms and classical bounded-error algorithms 
(let us stress again that this does not prove an exponential separation in the usual circuit model, 
because we are counting queries rather than ordinary operations here). Simon’s algorithm inspired 
Shor to his factoring algorithm, which we describe in Chapter 5. 

We will prove the classical lower bound for a decision version of Simon’s problem: 


Given: input x = (Zo,...,¢N—1), where N = 2” and z; € {0,1}” 
Promise: Js € {0,1}" such that: x; = x; iff (i = j or i = j ® s) 
Task: decide whether s = 0” 


Consider the input distribution u that is defined as follows. With probability 1/2, x is a uniformly 
random permutation of {0, 1}”; this corresponds to the case s = 0”. With probability 1/2, we pick 
a nonzero string s at random, and for each pair (i, i ® s), we pick a unique value for x; = Zigs at 
random. If there exists a randomized T-query algorithm that achieves success probability > 2/3 
under this input distribution u, then there also is deterministic T-query algorithm that achieves 
success probability > 2/3 under u (because the behavior of the randomized algorithm is an average 
over a number of deterministic algorithms). Now consider a deterministic algorithm with error 
< 1/3 under p, that makes T queries to 2. We want to show that T = 0(/2"). 

First consider the case s = 0”. We can assume the algorithm never queries the same point 
twice. Then the T outcomes of the queries are T distinct n-bit strings, and each sequence of T 
strings is equally likely. 

Now consider the case s 4 0”. Suppose the algorithm queries the indices i1, . . . , ip (this sequence 
depends on x) and gets outputs Zi,- --, Zip. Call a sequence of queries i1,...,ir good if it shows 
a collision (i.e., £i, = Zi, for some k # £), and bad otherwise. If the sequence of queries of the 
algorithm is good, then we can find s, since ip @ ig = s. On the other hand, if the sequence is bad, 
then each sequence of T distinct outcomes is equally likely—just as in the s = 0” case! We will 
now show that the probability of the bad case is very close to 1 for small T. 

If 24,...,%,—1 is bad, then we have excluded at most CS) possible values of s (namely all values 
ij ® iy for all distinct j, j’ € [k — 1]), and all other values of s are equally likely. The probability 
that the next query i makes the sequence good, is the probability that x;, = xi; for some j < k, 
equivalently, that the set Sk = {ix ® i; | j < k} happens to contain the string s. But S; has only 


k — 1 members, while there are at least 2” — 1 — ee equally likely remaining possibilities for s. 
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This means that the probability that the sequence is still bad after query ig is made, is very close 
to 1. In formulas: 


T 
Prlit,...,i7 is bad] [| Prinesie is bad | a1,..., a1 is bad] 
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Here we used the fact that (1 — a)(1 — b) > 1 — (a+ b) if a,b > 0. Note that 57L (k - 1) = 
T(T — 1)/2 = T?/2, and 2” — 1 — (3) ~ 2” as long as k < v2”. Hence we can approximate the 
last formula by 1 — T?/2"+!. Accordingly, if T « v2” then with probability nearly 1 (probability 
taken over the input distribution u) the algorithm’s sequence of queries is bad. If it gets a bad 
sequence, it cannot “see” the difference between the s = 0” case and the s Æ 0” case, since both 
cases result in a uniformly random sequence of T distinct n-bit strings as answers to the T queries. 
This shows that T has to be 2(\/2”) in order to enable the algorithm to get a good sequence of 
queries with high probability. 


Exercises 


1. Give the projectors of the 2”-outcome projective measurement that is applied to the whole 
2n-qubit state in Simon’s algorithm right after the query. 


2. Analyze the different steps of Simon’s algorithm if s = 0” (so all 2;-values are distinct), and 
show that the final output j is uniformly distributed over {0,1}”. 


3. Suppose we run Simon’s algorithm on the following input x (with N = 8 and hence n = 3): 


zooo = L111 = 000 
Xo01 = L119 = 001 
Xo10 = £101 = 010 
Lou = z100 = O11 


Note that x is 2-to-1 and x; = xj@111 for all i € {0,1}°, so s = 111. 
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Give the starting state of Simon’s algorithm. 

Give the state after the first Hadamard transforms on the first 3 qubits. 

Give the state after applying the oracle query. 

Give the state after measuring the second register (suppose the measurement gave |001)). 
Using HS”ļi) = Fr Xet CDI), give the state after the final Hadamards. 
Why does a measurement of the first 3 qubits of the final state give information about s? 


Suppose the first run of the algorithm gives j = 011 and a second run gives j = 101. 
Show that, assuming s # 000, those two runs of the algorithm already determine s. 
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4. Consider the following generalization of Simon’s problem: the input is x = (xo,...,%N_1), 
with N = 2” and z; € {0,1}”", with the property that there is some unknown subspace V C 
{0,1}" (where {0,1}" is the vector space of n-bit strings with entrywise addition modulo 2) 
such that x; = x; iff there exists a v € V such that i = 7 @v. The usual definition of Simon’s 
problem corresponds to the case of 1-dimensional subspace V = {0, s}. 


Show that one run of Simon’s algorithm now produces a j € {0,1}” that is orthogonal to the 
whole subspace (i.e., j -v = 0 mod 2 for every v € V). 


5. Let f : {0,13" > {0,1}"—! be a 2-to-1 function, meaning that every y € {0,1}"~1 has exactly 
two distinct pre-images z,x’ € {0,1}". Suppose there is an efficient quantum circuit (i.e., 
with number of elementary gates that’s polynomial in n) to compute f, but no efficient circuit 
that can produce from given x € {0,1}" an x’ Æ x such that f(x) = f(z’). 


Show how a quantum computer can efficiently generate a uniformly random y € {0,1}"~! 
and an associated n-qubit state |¢,) such that: 

(1) when asked, from |¢,) you can efficiently generate an x such that f(x) = y; 

and 

(2) when asked, you can efficiently sample uniformly from the set 

{a € {0,1}":a- (x x’) =0 mod 2}, where x and 2’ are the two pre-images of y. 

Comment: You’re not supposed to do both tasks (1) and (2) one after another, only either one of the two 
(whichever you’re asked to do). This problem may look arbitrary but was recently used to design an effi- 
cient protocol through which a classical computer can efficiently verify that a quantum computer works as 
intended [184]. 


6. (a) Suppose z is an N-bit string. What happens if we apply a Hadamard transform to each 


. er 1 ayeo? 
qubit of the N-qubit state Jan > 1)*¥|y)7 
ye {0,1} 
(b) Give a quantum algorithm that uses T queries to N-bit string x, and that maps |y) => 
(—1)"|y) for every y € {0,1} that contains at most T 1s (ie., for every y of Hamming 
weight < T). You can argue on a high level, no need to write out circuits in detail. 


(c) (H) Give a quantum algorithm that with high probability outputs x, using at most 
N/2 + 2/N queries to x. 


(d) Argue that a classical algorithm needs at least N queries in order to have success prob- 
ability > 1/2 of outputting the correct x. 
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Chapter 4 


The Fourier Transform 


4.1 The classical discrete Fourier transform 


The Fourier transform occurs in many different versions throughout classical computing, in areas 
ranging from signal-processing to data compression to complexity theory. 

For our purposes, the Fourier transform is going to be an N x N unitary matrix, all of whose 
entries have the same magnitude. For N = 2, it’s just our familiar Hadamard transform: 


1 1 1 
eee eee) 


Doing something similar in 3 dimensions is impossible with real numbers: we can’t give three 
orthogonal vectors in {+1,—1}%. However, using complex numbers allows us to define the Fourier 
transform for any N. Let wy = e27/" be an N-th root of unity (“root of unity” means that wh =1 
for some integer k, in this case k = N). The rows of the matrix will be indexed by j € {0,..., N—1} 
and the columns by k € {0,..., N — 1}. Define the (j, &)-entry of the matrix Fy by Tren where 
the exponent jk is the usual product of two integers: 


a all 


VN ki 


This Fy is a unitary matrix, because each column has norm 1 and any two distinct columns (say 
those indexed by k and k’) are orthogonal: 


wI = IFE _ 
& yet yet =F 


using the formula for geometric sums from Appendix B.1. 
Since Fy is unitary and symmetric, the inverse F i = Fy only differs from Fy by having minus 
signs in the exponent of the entries. For a veniat v ERY, ig vector Y = Fv is called the Fourier 


1 f i a N-1 
transform of v.. Its entries are given by vj = Tn yee Pup. 


a ifk =k’ 


0 otherwise 


'The literature on Fourier analysis usually talks about the Fourier transform of a function rather than of a vector, 
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4.2 The Fast Fourier Transform 


The naive way of computing the Fourier transform 0 = Fyv of v € RY just does the matrix- 
vector multiplication to compute all the entries of 0. This would take O(N) steps (additions and 
multiplications) per entry, and O(N?) steps to compute the whole vector 9. However, there is a 
more efficient way of computing U. This algorithm is called the Fast Fourier Transform (FFT, due 
to Cooley and Tukey in 1965 [93]), and takes only O(N log N) steps. This difference between the 
quadratic N? steps and the near-linear N log N is tremendously important in practice when N is 
large, and is the main reason that Fourier transforms are so widely used. 

We will assume N = 2”, which is usually fine because we can add zeroes to our vector to make 
its dimension a power of 2 (but similar FFTs can be given also directly for most N that aren’t a 
power of 2). The key to the FFT is to rewrite the entries of U as follows: 


1 N-1 l 
JN ene 
em a 


even k odd k 


aware V Drak “we "INP V 2 ddk Np 


Note that we’ve rewritten the entries of the N-dimensional Fourier transform UV in terms of two 
N/2-dimensional Fourier transforms, one of the even-numbered entries of v, and one of the odd- 
numbered entries of v. 


This suggest a recursive procedure for computing D: first separately compute the Fourier trans- 
form Veven of the N /2-dimensional vector of even-numbered entries of v and the Fourier transform 
oda of the N/2-dimensional vector of odd-numbered entries of v, and then compute the N entries 


Dj = —=(Veveny + why Toda;)- 
V2 


Strictly speaking this is not well-defined, because Teven and oda are just N/2-dimensional vectors. 
However, if we take two copies of these N/2-dimensional vectors to get an N-dimensional vector, 
defining eens +N/2 = evan (and similarly for sqa), then it all works out. 


The time T(N) it takes to implement Fy this way can be written recursively as T(N) = 
2T(N/2) + O(N), because we need to compute two N/2-dimensional Fourier transforms and do 
O(N) additional operations to compute v. This recursion works out to time T(N) = O(N log N), 
as promised. Similarly, we have an equally efficient algorithm for the inverse Fourier transform 


Fy = Fx, whose entries are JHON 


but on finite domains that’s just a notational variant of what we do here: a vector v € R can also be viewed as a 
function v : {0,..., N — 1} > R defined by v(i) = vi. Also, in the classical literature people sometimes use the term 
“Fourier transform” for what we call the inverse Fourier transform. 
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4.3 Application: multiplying two polynomials 


Suppose we are given two real-valued polynomials p and q, each of degree at most d: 
d d 
plz) = yar and q(x) = > bpr" 
j=0 k=0 


We would like to compute the product of these two polynomials, which is 


d , d 2d 2d 
(p-a)(x) = |} aja? a nat) =X ($ agbe-)2*, 
j=0 k=0 t=0 j=0 


ce 


where implicitly we set a; = bj = 0 for j > d and be_; = 0 if j > L. Clearly, each coefficient cy by 
itself takes O(d) steps (additions and multiplications) to compute, which suggests an algorithm for 
computing the coefficients of p-q that takes O(d”) steps. However, using the fast Fourier transform 
we can do this in O(dlog d) steps, as follows. 

The convolution of two vectors a,b € RY is a vector a xb € R whose ¢-th entry is defined 
by (a x b)e = it sy aj;be-jmoan- Let us set N = 2d + 1 (the number of nonzero coefficients 
of p- q) and make the above (d + 1)-dimensional vectors of coefficients a and b N-dimensional by 
adding d zeroes. Then the coefficients of the polynomial p - q are proportional to the entries of the 
convolution: cg = VN (a * b)¢. It is easy to show that the Fourier coefficients of the convolution of 
a and b are the products of the Fourier coefficients of a and b: for every £ € {0,..., N—1} we have 


(ast b) es ae by. This immediately suggests an algorithm for computing the vector of coefficients cg: 


apply the FFT to a and b to get @ and b, multiply those two vectors entrywise to get axl b, apply 
the inverse FFT to get axb, and finally multiply a*b with VN to get the vector c of the coefficients 
of p-q. Since the FFTs and their inverse take O(N log N) steps, and pointwise multiplication of 
two N-dimensional vectors takes O(N) steps, this algorithm takes O(N log N) = O(dlog d) steps. 

Note that if two numbers aq- -aiao and bg---6,b9 are given in decimal notation, then we can 
interpret their digits as coefficients of single-variate degree-d polynomials p and q, respectively: 
plz) = ye a;x) and q(x) = a bx". The two numbers will now be p(10) and q(10). Their 
product is the evaluation of the product-polynomial p -q at the point x = 10. This suggests that 
we can use the above procedure (for fast multiplication of polynomials) to multiply two numbers in 
O(dlog d) steps, which would be a lot faster than the standard O(d?) algorithm for multiplication 
that one learns in primary school. However, in this case we have to be careful since the steps of the 
above algorithm are themselves multiplications between numbers, which we cannot count at unit 
cost anymore if our goal is to implement a multiplication between numbers! Still, it turns out that 
implementing this idea carefully allows one to multiply two d-digit numbers in O(d log dlog log d) 
elementary operations. This is known as the Schénhage-Strassen algorithm [218] (slightly improved 
further by Fiirer [116] and Harvey and van der Hoeven [135]), and is one of the ingredients of Shor’s 
algorithm in the next chapter. We’ll skip the details. 


4.4 The quantum Fourier transform 


Since Fy is an N x N unitary matrix, we can interpret it as a quantum operation, mapping an N- 
dimensional vector of amplitudes to another N-dimensional vector of amplitudes. This is called the 
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quantum Fourier transform (QFT). In case N = 2” (which is the only case we will care about), this 
will be an n-qubit unitary. Notice carefully that this quantum operation does something different 
from the classical Fourier transform: in the classical case we are given a vector v, written on a piece 
of paper so to say, and we compute the vector 0 = Fyv, and also write the result on a piece of 
paper. In the quantum case, we are working on quantum states; these are vectors of amplitudes, but 
we don’t have those written down anywhere—they only exist as the amplitudes in a superposition. 
We will see below that the QFT can be implemented by a quantum circuit using O(n”) elementary 
gates. This is exponentially faster than even the FFT (which takes O(N log N) = O(2"n) steps), 
but it achieves something different: computing the QFT won’t give us the entries of the Fourier 
transform written down on a piece of paper, but only as the amplitudes of the resulting state. 


4.5 An efficient quantum circuit 


Here we will describe the efficient circuit for the n-qubit QFT. The elementary gates we will allow 
ourselves are Hadamards and controlled-R, gates, where 


Note that Ry = Z = ( : a ), R2 = ( : | For large s, e27/2° is close to 1 and hence 


the R,-gate is close to the identity-gate I. We could implement R,-gates using Hadamards and 
controlled-Rj /2/3 gates, but for simplicity we will just treat each Rs as an elementary gate. 

Since the QFT is linear, it suffices if our circuit implements it correctly on all n-qubit basis 
states |k}, i.e., it should map 


Ik) 4 Fy|k) = aL 


The key to doing this efficiently is to rewrite Fy|k}, eee turns out to be a product state (so Fy 
does not introduce entanglement when applied to a basis state |k)), as follows. Let |k} = |ki...kn), 
kı being the most significant bit. Note that for integer j = jı ... jn, we can write 7/2” = Yy] je2~* 

For example, binary 0.101 is 1 -27t + 0-27-24 1-273 = 5/8. We have the following sequence of 
equalities (which is probably most easily verified by working backwards from the last formula): 


N-1 
e?Tiik/2” lj j) 
0 


1 
Fy|k) JN 
j= 


1 Qri(S p1 e27 E)k] y . 
iia da) 
ae 
" je 


E 2, He BT ed 
{0,1}” €=1 


n 


; 0: 


$ v4 . mf . . . 
Note that €27#4/2" = ¢27iki.kn—e-kn—e+1--kn — eTiOkn-e+1--kn; the n — € most significant bits of k 
don’t matter for this value, because e?"”" = 1 if m is an integer. 


err) 
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As an example, for n = 3 we have the 3-qubit product state 


1 1 ; 1 l 
ae oe 0 + e270. kaks 1 Q ae 0 + e2Ti0-kikoks 1 ; 
a val ) |1)) zll ) |1)) 
This example suggests what the circuit should be. To prepare the first qubit of the desired state 
Fs|kık2k3)}, we can just apply a Hadamard to |k3), giving state 75 (10) +(—1)*3|1)) and observe that 


F|kikoks) = —=(|0) + e?7"°-*5/1)) @ 


(—1)*3 = €?70-k3_ To prepare the second qubit of the desired state, apply a Hadamard to |k2), giving 
Ja (10) + e?79-F211)). and then conditioned on kg (before we apply the Hadamard to |k3)) apply R2. 
This multiplies |1) with a phase e27"?-°*s | producing the correct qubit 5 (10) +¢2ri0-koks|1\). Finally, 
to prepare the third qubit of the desired state, we apply a Hadamard to |k1), apply R2 conditioned 
on kg, and Rg conditioned on kg. This produces the correct qubit 5 (|0) + e?™i0.kıik2ks|1)), We have 
now produced all three qubits of the desired state Fg|kik2k3), but in the wrong order: the first 
qubit should be the third and vice versa. So the final step is just to swap qubits 1 and 3. Figure 4.1 
illustrates the circuit in the case n = 3. Here the black circles indicate the control-qubits for each 
of the controlled-R, operations, and the operation at the end of the circuit swaps qubits 1 and 3. 
The general case works analogously: starting with = 1, we apply a Hadamard to |k) and then 
“rotate in” the additional phases required, conditioned on the values of the later bits kg41... kn. 
Some swap gates at the end then put the qubits in the right order.” 


Figure 4.1: The circuit for the 3-qubit QFT 


Since the circuit involves n qubits, and at most n gates are applied to each qubit, the overall 
circuit uses at most n? gates. In fact, many of those gates are phase gates R, with s >> logn, which 
are very close to the identity and hence don’t do much anyway. As observed by Coppersmith [94], 
we can actually omit those from the circuit, keeping only O(logn) gates per qubit and O(n logn) 
gates overall. Intuitively, the overall error caused by these omissions will be small (Exercise 4 asks 
you to make this precise). Finally, note that by inverting the circuit (i.e., reversing the order of 
the gates and taking the adjoint U* of each gate U) we obtain an equally efficient circuit for the 
inverse Fourier transform F ie = Fy. 


4.6 Application: phase estimation 


An important applications of the QFT is in phase estimation. This was originally due to Ki- 
taev [155], it was put in a broader context by Cleve et al. [91], and is now a very common subroutine 


?We can implement a SWAP-gate using CNOTs (Exercise 2.3); CNOTs and controlled-R, gates can be constructed 
from Hadamard and controlled-R, (= controlled-Z) gates, which are in the allowed set of elementary gates here. 
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in many quantum algorithms. 

Suppose we can apply a unitary U and we are given an eigenvector |p) of U with corresponding 
unknown eigenvalue A (i.e., U|w) = Alw)), and we would like to compute or at least approximate 
the A. Since U is unitary, \ must have magnitude 1, so we can write it as À = e?™? for some real 
number ¢ € [0, 1); the only thing that matters is this phase ¢. Suppose for simplicity that we know 
that ¢ = 0.¢1...¢n can be written exactly with n bits of precision. Then here’s the algorithm for 
phase estimation: 


1. Start with |0”)|y). 


2. For N = 2”, apply Fy to the first n qubits to get Tm Sea |) |) 
(in fact, H8” @ I would have the same effect). 


3. Apply the map |j)|%) = |j)U2 |) = etilip). In other words, apply U to the second 
register for a number of times given by the first register. 


4. Apply the inverse Fourier transform F he to the first n qubits and measure the result. 


Note that after step 3, the first n qubits are in state it Sure e270I) j} = Fy |2"), hence (under 
the assumption that @ can be written exactly with n bits) the inverse Fourier transform is going 
to give us |2"¢) = |¢1...¢n) with probability 1. 

In case @ cannot be written exactly with n bits of precision, then one can show that this 
procedure still (with high probability) spits out a good n-bit approximation to ¢. We’ll omit the 
calculation. 


Exercises 
1 1 1 0 1 
1. For w = e?7'/3 and F} = A 1 w «w? |, calculate F3{ 1 | and F| w? 
hoe 0 w 


2. Prove that the Fourier coefficients of the convolution of vectors a and b are the product of 
the Fourier coefficients of a and b. In other words, prove that for every a,b € R and every 
Le {0,..., N — 1} we have (a * b), = dy: be. Here the Fourier transform @ is defined as the 


vector Fa, and the ¢-entry of the convolution-vector axb is (a*b)¢ = I De ajbi- jjmodN: 


3. (H) The total variation distance between two probability distributions P and Q on the same 
set, is defined as dry p(P,Q) = 4 X; |P(i)— Q(i)|. An equivalent alternative way to define 
this: dry p(P,Q) is the maximum, over all events E, of |P(E) — Q(E)|. Hence drvp(P, Q) 
is small iff all events have roughly the same probability under P and under Q. 


The Euclidean distance between two states |¢) = >>, aili) and |Y) = 5°, Aili) is defined as 
Ilo) — |) || = Vo; lai — Bi|?. Assume the two states are unit vectors with (for simplicity) 
real amplitudes. Suppose the Euclidean distance is small: |||} — |w)|| = €. If we measure |o) 
in the computational basis then the probability distribution over the outcomes is given by 
the |a;|?, and if we measure |y) then the probabilities are |;|?. Show that these distributions 
are close: the total variation distance 5 X; la? = 6? | is < €. 


34 


4. (H) The operator norm of a matrix A is defined as ||A|| = max _ ||Av||. 


6. 


v:||v||=1 


An equivalent definition is that ||A|| is the largest singular value of A (see Appendix A.5). 
The distance between two matrices A and B is defined as ||A — B]]. 


f . l . : 1 0 
(a) What is the distance between the 2 x 2 identity matrix and the phase-gate ( 0 ei? y 


(b) What is the distance between the 4 x 4 identity matrix and the controlled version of the 
phase gate of (a)? 

(c) What is the distance between the 2” x 2” identity matrix [gn and the controlled phase 
gate of (b) tensored with Ign-2? 


(d) Suppose we have a product of n-qubit unitaries U = UrUr-1 -+ U1 (for instance, each U; 
could be an elementary gate on a few qubits, tensored with identity on the other qubits). 
Suppose we drop the j-th gate from this sequence: U’ = UpUp_1--+Uj41U;-1-+- U1. 
Show that ||U’ — U|| = ||I — U;||. 

(e) Now we also drop the k-th unitary: U” = UpUp_1---Uj41Uj-1--++ +> Uk+1Uk-1 +++ U4. 
Show that ||U” — U|| < ||Z — U;|| + ||Z — Us]. 

(f) Give a quantum circuit with O(n log n) elementary gates that has distance less than 1/n 
from the Fourier transform Fon. 


Comment: The above exercise shows the important fact that if we have a quantum circuit C that has various 
subparts (“subroutines”), then a circuit C where those subroutines are implemented with small operator-norm 
error, rather than perfectly, still works well: if llc =C | is small then (by definition of operator norm) for all 
initial states |) the states C|¢) and C|¢) are close in Euclidean distance. By Exercise 3 then also the final 


output distributions are close (in total variation distance). 


. Suppose a € RN is a vector (indexed by £ = 0,..., N — 1) which is r-periodic in the following 


sense: there exists an integer r such that ag = 1 whenever @ is an integer multiple of r, and 
ae = 0 otherwise. Compute the Fourier transform Fy a of this vector, i.e., write down a 
formula for the entries of the vector Fya. Assuming r divides N, write down a simple closed 
form for the formula for the entries. Which are the nonzero entries in the vector Fy a, and 
what is their magnitude? 


(a) The squared Fourier transform, F a turns out to map computational basis states to 
computational basis states. Describe this map, i.e., determine to which basis state a 
basis state |k} gets mapped for each k € {0,1}”. 


(b) Show that F$ = I. What can you conclude about the eigenvalues of Fy? 
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Chapter 5 


Shor’s Factoring Algorithm 


5.1 Factoring 


Probably the most important quantum algorithm so far is Shor’s factoring algorithm [228]. It can 
find a factor of a composite number N in roughly (log N)? steps, which is polynomial in the length 
log N of the input. On the other hand, there is no known classical (deterministic or randomized) 
algorithm that can factor N in polynomial time. The best known classical randomized algorithms 
run in time roughly 

geen) 


where a = 1/3 for a heuristic upper bound [165] and a = 1/2 for a less-heuristic but still not fully 
proven upper bound [166]. In fact, much of modern cryptography is based on the conjecture that 
no fast classical factoring algorithm exists [211]. All this cryptography (for example RSA) would 
be broken if Shor’s algorithm could be physically realized. In terms of complexity classes: factoring 
(rather, the decision problem equivalent to it) is provably in BQP but is not known to be in BPP. 
If indeed factoring is not in BPP, then the quantum computer would be the first counterexample 
to the “strong” Church-Turing thesis, which states that all “reasonable” models of computation 
are polynomially equivalent (see [107] and [198, p.31,36]). 


5.2 Reduction from factoring to period-finding 


The crucial observation of Shor was that there is an efficient quantum algorithm for the problem 
of period-finding and that factoring can be reduced to this, in the sense that an efficient algorithm 
for period-finding implies an efficient algorithm for factoring. 

We first explain the reduction. Suppose we want to find factors of the composite number N > 1. 
We may assume N is odd and not a prime power, since those cases can easily be filtered out by a 
classical algorithm. Now randomly choose some integer £ € {2,..., N — 1} which is coprime! to 
N. If x is not coprime to N, then the greatest common divisor of x and N is a nontrivial factor 
of N, so then we are already done. From now on consider x and N are coprime, so x is an element 


'The greatest common divisor of two integers a and b is the largest positive integer c that divides both a and b. 
If gcd(a, b) = 1, then a and b are called coprime. The gcd can be computed efficiently (in time roughly quadratic in 
the number of bits of a and b) on a classical computer by Euclid’s algorithm. 
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of the multiplicative group Zy. Consider the sequence 
1=2° (mod N), z! (mod N), z? (mod N),... 


This sequence will cycle after a while: there is a least 0 < r < N such that z” = 1 (mod N). This 
r is called the period of the sequence (a.k.a. the order of the element z in the group Zi). Assuming 
N is odd and not a prime power (those cases are easy to factor anyway), it can be shown that 
with probability > 1/2, the period r is even and z’/2 +1 and z” — 1 are not multiples of N (196, 
Theorem A4.13]. In that case we have: 


zx = 1 moda N <> 

(x"/? 2 = 1 modN <= 

(12 +1) ("2 -1) = 0 modN <> 
(2? I} 1)(x"/? —1) = kN for some k. 


Note that k > 0 because both 2”/2 +1 > 0 and x’/2-—1>0 (x > 1). Hence z”? +1 or z"? —1 
will share a factor with M. Because x’/? + 1 and 2/2 — 1 are not multiples of N this factor will 
be < N, and in fact both these numbers will share a non-trivial factor with N. Accordingly, if we 
have r then we can compute the greatest common divisors gcd(x"/? + 1, N) and gcd(a"/? — 1, N), 
and both of these two numbers will be non-trivial factors of N. If we are unlucky we might have 
chosen an x that does not give a factor (which we can detect efficiently), but trying a few different 
random x gives a high probability of finding a factor. 

Thus the problem of factoring reduces to finding the period r of the function given by modular 
exponentiation f(a) = x«* mod N. In general, the period-finding problem can be stated as follows: 


The period-finding problem: 
We are given some function f : N > {0,..., N — 1} with the property that there is some unknown 
r € {0,...,N — 1} such that f(a) = f(b) iff a=b mod r. The goal is to find r. 


One might think that if f itself is efficiently computable, then period-finding is an easy problem to 
solve even on a classical computer: just compute f(0), f(1), f(2),... until we encounter the value 
f(0) for the second time. The input at which this happens is the period r that we’re trying to 
find. The problem with this approach is that r could be huge, for instance N!/2 or N/10, which 
is exponentially large in the number of inputs bits. To be efficient, we would like a runtime that 
is polynomial in log N, since that is the bitsize of the inputs to f. It is generally believed that 
classical computers cannot solve period-finding efficiently. 

We will show below how we can solve this problem efficiently on a quantum computer, using 
only O(log log N) evaluations of f and O(loglog N) quantum Fourier transforms. An evaluation 
of f can be viewed as analogous to the application of a query in the algorithms of the previous 
chapters. Even a somewhat more general kind of period-finding can be solved by Shor’s algorithm 
with very few f-evaluations, whereas any classical bounded-error algorithm would need to evaluate 
the function Q(N!/3/,/log N) times in order to find the period [88]. 

How many steps (elementary gates) does Shor’s algorithm take? For a = N O0) we can com- 
pute f(a) = x° mod N in O((log N)? log log N log log log N) steps by the “square-and-multiply” 
method, using known algorithms for fast integer multiplication mod N, see Exercise 1. 

Moreover, as explained in the previous chapter, the quantum Fourier transform can be im- 
plemented using O((log N)*) steps. Accordingly, Shor’s algorithm finds a factor of N using an 
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expected number of O((log N)?(log log N)? log log log N) gates, which is only slightly worse than 
quadratic in the input length. 


5.3 Shor’s period-finding algorithm 


Now we will show how Shor’s algorithm finds the period r of the function f, given a “black-box” that 
maps |a)|0") + |a)|f(a)). We can always efficiently pick some q = 2° such that N? < q < 2N?. 
Then we can implement the Fourier transform F} using O((log N)”) gates. Let Op denote the 
unitary that maps |a)|0”") +> |a)|f(a)), where the first register consists of £ qubits, and the second 
of n = [log N] qubits. 


10) — |—— 
F; F, : measure 
10) —— — 
Of 
|0) —— 
measure 
|0) —— 


Figure 5.1: Shor’s period-finding algorithm 


Shor’s period-finding algorithm is illustrated in Figure 5.1.? Start with |0°)|0"). Apply the 
QFT (or just Hadamard gates) to the first register to build the uniform superposition 


E 
a Ja 


The second register still consists of zeroes. Now use the “black-box” to compute f(a) in quantum 
parallel: 


= 
T Ma) 


Observing the second register gives some value f(s), with s < r. Let m be the number of elements 
of {0,...,q — 1} that map to the observed value f(s). Because f(a) = f(s) if a =s mod r, the 
a of the form a = jr +s (0 < j < m) are exactly the a for which f(a) = f(s). Thus the first 
register collapses to a superposition of |s}, |r + s), |2r + s}, |3r + s), .. .; this superposition runs until 
the last number of the form jr + s that is < q, let’s define m to be the number of elements in 
this superposition, i.e., the number of integers j such that jr +s € {0,...,q—1} (depending on s, 


? Notice the resemblance of the basic structure (Fourier, f-evaluation, Fourier) with the basic structure of Simon’s 
algorithm (Hadamard, query, Hadamard). This is not a coincidence, because Shor was inspired by reading Simon’s 
paper. The number of qubits used is roughly 3log N: logq ~ 2log N qubits for the first register, and [log N] for the 
second register. This number can be reduced to slightly more than 2log N qubits [38, 131]. Accordingly, to factor 
for instance a 2048-bit integer N, slightly more than 4096 (perfect) qubits suffice. 
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this m will be [g/r] or |q/r]|). The second register collapses to the classical state |f(s)). We can 
now ignore the second register, and have in the first: 


1 ; 
— X) lir +s) 
m ¢ 
j=0 
Applying the QFT again gives 
1 1G miei 1 anise [SR onsire 
aa e € |b) = —— eta e @ | |b). 

om j=0 Tye ma po j=0 


We want to see which |b) have amplitudes with large squared absolute value—those are the b we are 
likely to see if we now measure. Using that Da z = (1—2™)/(1— z) for z £ 1 (see Appendix B), 
we compute: 

mL ey O ONI m ifea =1 


r 5.1 
if ene # 1 ( ) 


Easy case: r divides q. Let us do an easy case first. Suppose r divides q, so the whole period 
“fits” an integer number of times in the domain {0,...,q— 1} of f, and m = q/r. For the first 
case of Eq. (5.1), note that e2?7"/4 — 1 iff rb/q is an integer iff b is a multiple of q/r. Such b will 
have squared amplitude equal to (m/,/mq)? = m/q = 1/r. Since there are exactly r such basis 
states b, together they have all the amplitude: the sum of squares of those amplitudes is 1, so 
the amplitudes of b that are not integer multiples of q/r must all be 0. Thus we are left with a 
superposition where only the b that are integer multiples of g/r have nonzero amplitude. Observing 
this final superposition gives some random multiple b = cq/r, with c a uniformly random number 
in {0,...,r — 1}. Thus we get a b such that 


where b and q are known to the algorithm, and c and r are not. There are ¢(r) € Q(r/loglogr) 
numbers smaller than r that are coprime to r [133, Theorem 328], so c will be coprime to r with 
probability Q(1/loglogr) > Q(1/loglog N). Accordingly, an expected number of O(log log N) 
repetitions of the procedure of this section suffices to obtain a b = cq/r with c coprime to r.? Once 
we have such a b, we can obtain r as the denominator by writing b/q in lowest terms. Of course, 
our algorithm doesn’t actually know whether c and r are coprime in some particular run of the 
algorithm, but it can efficiently check if the purported factors ged(a™! 2 + 1) are actual factors of N 
by division (which, like multiplication, can be done classically with a near-linear number of gates). 


Hard case: r does not divide q. Because our q is a power of 2, it is actually quite likely that 
r does not divide q. However, the same algorithm will still yield with high probability a b which 
is close to a multiple of q/r. Note that q/r is no longer an integer, and m = |q/r], possibly +1. 


3The number of required f-evaluations for period-finding can actually be reduced from O(log log N) to O(1). 
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All calculations up to and including Eq. (5.1) are still valid. Using |1 — e””| = 2| sin(@/2)|, we can 
rewrite the absolute value of the second case of Eq. (5.1) to 


Qrib : 
|l—e q | E |sin(rmrb/q)| 
jd — ene | |sin(arb/q)| ` 


The right-hand side is the ratio of two sine-functions of b, where the numerator oscillates much 
faster than the denominator because of the additional factor of m. Note that the denominator is 
close to 0 (making the ratio large) iff b is close to an integer multiple of g/r. For most of those b, 
the numerator won’t be close to 0. Hence, roughly speaking, the ratio will be small if b is far 
from an integer multiple of q/r, and large for most b that are close to a multiple of q/r. Doing 
the calculation precisely, one can show that with high probability (see [228, 196] for details) the 
measurement yields a b such that 


b c 1 
q r| 2g 
for a random c € {0,...,r — 1}. Equivalently, |b — cq/r| < 1/2, so the measurement outcome b will 


be an integer multiple of q/r rounded up or down to an integer. As in the easy case, b and q are 
known to us while c and r are unknown. 

Because the known ratio b/q is now not exactly equal to the unknown ratio q/r, we cannot just 
try to find r by writing b/q in lowest terms like we did in the easy case. However, two distinct 
fractions, each with denominator < N, must be at least 1/N? > 1/q apart.* Therefore c/r is the 
only fraction with denominator < N at distance < 1/2q from the known ratio b/q. Applying a 
classical method called “continued-fraction expansion” to b/q efficiently gives us the fraction with 
denominator < N that is closest to b/q (see next section). This fraction must be c/r. Again, c and 
r will be coprime with probability Q(1/loglog r), in which case writing c/r in lowest terms gives r. 


5.4 Continued fractions 


Let [ao, a1, a2,...] (finite or infinite) denote the real number 


1 
ao + ——— 
as = 


a2+— 
This is called a continued fraction (CF). The a; are the partial quotients. We assume these to be 
positive natural numbers ([133, p.131] calls such CF “simple”). [ao,...,an] is the n-th convergent of 
the fraction. [133, Theorem 149 & 157] gives a simple way to compute numerator and denominator 
of the n-th convergent from the partial quotients: 


If 
Po= 40, pı =aiao +1, Pn = 4nPn—-1 + Pn—2 
go = 1, qi = 41, qn = GnGn-1 + qn-2 
then [ao,...,@n] = ay Moreover, this fraction is in lowest terms. 


Qn 


“Consider two fractions c/r and c r’ with integer C, c, r, r’, and T, r’ < N. Ifc/r c r’ then cr = dr is a nonzero 
8 = 
inte er, and hence |c/r — d r’ = cr’ si dr rr’ >1 rr’ >1 N?. 
g = = 
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Note that qn increases at least exponentially with n (qn > 2qn—2). Given a real number g, the 
following “algorithm” gives a continued fraction expansion of x [133, p.135]: 


ao := |z], zı := 1/(x — ao) 
a := |z1], z2 := 1/(xı — a1) 
az := |z2|, v3 :=1/(x2 — ag) 


Informally, we just take the integer part of the number as the partial quotient and continue with the 
inverse of the decimal part of the number. The convergents of the CF approximate z as follows [133, 
Theorem 164 & 171]: 


wee 


n 


If x = [ao, a1,...] then 


Recall that qn increases exponentially with n, so this convergence is quite fast. Moreover, Pn/qn 
provides the best approximation of x among all fractions with denominator < qn [133, Theorem 181]: 


Pp 
CS 


dn 


If n >1,¢< qn, p/4 £ Pn/an, then 


Exercises 


1. This exercise is about efficient classical implementation of modular exponentiation. 


(a) (H) Given n-bit numbers x and N, compute the whole sequence 
2° mod N, z! mod N, z? mod N, x4 mod N, 28 mod N, x!ê mod N,...,22" mod N, 
using O(n? log(n) log log(n)) steps. 

(b) Suppose n-bit number a can be written as a = an-ı . . . @1đọ in binary. Express x“ mod 
N as a product of the numbers computed in part (a). 


(c) Show that you can compute f(a) = x° mod N in O(n? log(n) log log(n)) steps. 
2. Consider the function f(a) = 7% mod 10. 


(a) What is the period r of f? 


(b) Show how Shor’s algorithm finds the period of f, using a Fourier transform over q = 128 
elements. Write down all intermediate superpositions of the algorithm for this case 
(don’t just copy the general expressions from the notes, but instantiate them with actual 
numbers as much as possible, incl. with the value of the period found in (a)). You may 
assume you’re lucky, meaning the first run of the algorithm already gives a measurement 
outcome b = cq/r with c coprime to r. 


3. (H) This exercise explains basic RSA encryption. Suppose Alice wants to allow other people 
to send encrypted messages to her, such that she is the only one who can decrypt them. 
She believes that factoring an n-bit number can’t be done efficiently (efficient = in time 
polynomial in n). So in particular, she doesn’t believe in quantum computing. 


Alice chooses two large random prime numbers, p and q, and computes their product N = 
p-q (a typical size is to have N a number of n = 1024 bits, which corresponds to both 
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p and q being numbers of roughly 512 bits). She computes the so-called Euler ¢-function: 
(N) = (p — 1)(q — 1); she also chooses an encryption exponent e, which doesn’t share any 
nontrivial factor with ọ(N) (i.e., e and ¢(V) are coprime). Group theory guarantees there is 
an efficiently computable decryption exponent d such that de = 1 mod ¢(N). The public key 
consists of e and N (Alice puts this on her homepage), while the secret key consists of d and N. 
Any number m € {1,..., N — 1} that is coprime to N, can be used as a message. There are 
(N) such m, and these numbers form a group under the operation of multiplication mod 
N. The number of bits n = [log N] of N is the maximal length (in bits) of a message m and 
also the length (in bits) of the encryption. The encryption function is defined as C(m) = m° 
mod N, and the decryption function is D(c) = ct mod N. 


(a) Give a randomized algorithm by which Alice can efficiently generate the secret and public 
key. 


(b) Show that Bob can efficiently compute the encoding C(m) of the message m that he 
wants to send to Alice, knowing the public key but not the private key. 


(c) Show that D(C(m)) = m for all possible messages. 
(d) Show that Alice can efficiently decrypt the encryption C(m) she receives from Bob. 
(e) Show that if Charlie could factor N, then he could efficiently decrypt Bob’s message. 
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Chapter 6 


Hidden Subgroup Problem 


6.1 Hidden Subgroup Problem 


6.1.1 Group theory reminder 


A group G consists of a set of elements (which is usually denoted by G as well) and an operation 
o: G x G — G (often written as addition or multiplication), such that 


1. the operation is associative: go (ho k) = (go h) o k for all g,h,k € G; 


2. there is an identity element e € G satisfying eo g = g o e = g for every g € G; 


3. and every g € G has an inverse g7} € G, such that gog7! = g-!og = e (if the group 


operation is written as addition, then g~! is written as —g). 


We often abbreviate goh to gh. The group is Abelian (or commutative) if gh = hg for all g,h € G. 
Simple examples of finite additive Abelian groups are G = {0,1}”" with bitwise addition mod 2 
as the group operation, and G = Zy, the “cyclic group” of integers mod N. The set G = Z% is 
the multiplicative group consisting of all integers in {1,..., M — 1} that are coprime to N, with 
multiplication mod N as the group operation.' An important example of a non-Abelian group is 
the “symmetric group” Sn, which is the group of n! permutations of n elements, using composition 
as the group operation. 

A subgroup H of G, denoted H < G, is a subset of G that is itself a group, i.e., it contains e 
and is closed under taking products and inverses. A (left) coset of H is a set gH = {gh | h € H}, 
i.e., a translation of H by the element g. All cosets of H have size |H|, and it is easy to show that 
two cosets gH and g’H are either equal or disjoint, so the set of cosets partitions G into equal-sized 
parts. Note that g and g' are in the same coset of H iff g~'g’ € H. 

If T C G, then we use (T) to denote the set of elements of G that we can write as products of 
elements from T and their inverses. This H = (T) is a subgroup of G, and T is called a generating 
set of H. Note that adding one more element t ¢ (T) to T at least doubles the size of the generated 
subgroup, because H and tH are disjoint and H UtH C (T U{t}). This implies that every H < G 


'Euler’s ¢-function counts the number of elements of {1,...,N — 1} that are coprime to N, so |Z| = (N). 
Note that for prime p, the multiplicative Abelian group Z% is isomorphic to the additive group Zp_1. However, for 
general N, Zy need not be isomorphic to Zgn). 

This also proves Lagrange’s theorem for finite groups: if H < G then |H| divides |G]. 
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has a generating set of size < log|H| < log|G|. We abbreviate ({7}) to (y), which is the cyclic 
group generated by y; every cyclic group of size N is isomorphic to Zy. 


6.1.2 Definition and some instances of the HSP 


The Hidden Subgroup Problem is the following: 


Given a known group G and a function f : G— S where S is some finite set. 

Suppose f has the property that there exists a subgroup H < G such that f is constant 
within each coset, and distinct on different cosets: f(g) = f(g’) iff gH = g'H. 

Goal: find H. 


We assume f can be computed efficiently, meaning in time polynomial in log |G| (the latter is the 
number of bits needed to describe an input g € G for f). Since H may be large, “finding H” 
typically means finding a generating set for H. 

This looks like a rather abstract algebraic problem, but many important problems can be written 
as an instance of the HSP. We will start with some examples where G is Abelian. 


Simon’s problem. This is a very natural instance of HSP. Here G is the additive group Z5 = 
{0,1}” of size 2”, H = {0, s} for a “hidden” s € {0,1}”", and f satisfies f(x) = f(y) if x — y € H. 
Clearly, finding the generator of H (i.e., finding s) solves Simon’s problem. 


Period-finding. As we saw in Chapter 5, we can factor a large number N if we can solve the 
following: given an x that is coprime to N and associated function f : Z > Z} by f(a) = z“ mod N, 
find the period r of f.2 Since (x) is a size-r subgroup of the group Z*,, the period r divides 
[Z| = O(N). Hence we can restrict the domain of f to Zgy). 

Period-finding is an instance of the HSP as follows. Let G = Zg:y) and consider its subgroup 
H = (r) of all multiples of r up to #(N) (ie., H = rZgn) = {0,r,2r,..., (N) —r}). Note that 
because of its periodicity, f is constant on each coset s + H of H, and distinct on different cosets. 
Also, f is efficiently computable by repeated squaring. Since the hidden subgroup H is generated 
by r, finding the generator of H solves the period-finding problem. 


Discrete logarithm. Another problem often used in classical public-key cryptography is the 
discrete logarithm problem: given a generator y of a cyclic multiplicative group C of size N (so 
C = {7* | ae {0,..., N —1}}), and A € C, can we find the unique a € {0,1,..., N — 1} such that 
y* = A? This a is called the discrete logarithm of A (w.r.t. generator y). It is generally believed 
that classical computers need time roughly exponential in log N to compute a from A (and one 
can actually prove this in a model where we can only implement group operations via some “black- 
box” [229]). This assumption underlies for instance the security of Diffie-Hellman key exchange 
(where C = Z% for some large prime p, see Exercise 3), as well as elliptic-curve cryptography. 

Discrete log is an instance of the HSP as follows. We take G = Zy x Zy and define function 
f:G—>C by f(x,y) = y°AY, which is efficiently computable by repeated squaring. For group 
elements gı = (z1, Y1), 92 = (2, y2) E G we have 


fn = flg) SF" = 7 4 (x1 = Zo) = i — y2) mod N 4> gi — g2 E (a, 1)): 


3This r is also known as the order of the element z in the group ZV, so this problem is also known as order-finding. 
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Let H be the subgroup of G generated by the element (a, 1), then we have an instance of the HSP. 
Finding the generator of the hidden subgroup H gives us a, solving the discrete log problem. 


6.2 An efficient quantum algorithm if G is Abelian 


In this section we show that HSPs where G (and hence H) is Abelian, and where f is efficiently 
computable, can be solved efficiently by a quantum algorithm. This generalizes Shor’s factoring 
algorithm, and will also give an efficient quantum algorithm for computing discrete logarithms. 


6.2.1 Representation theory and the quantum Fourier transform 


We start by quickly explaining the basics of representation theory. The idea here is to replace 
group elements by matrices, so that linear algebra can be used as a tool in group theory. A d- 
dimensional representation of a multiplicative group G is a map p : g > p(g) from G to the set of 
d x d invertible complex matrices, satisfying p(gh) = p(g)p(h) for all g,h € Œ. The latter property 
makes the map p a homomorphism. It need not be an isomorphism (i.e., bijective), for example 
the constant-1 function is a trivial representation of any group. A representation of G is irreducible 
if it cannot be decomposed further into the direct sum of lower-dimensional representations of G. 
A 1-dimensional representation of G is called a character of G (sometimes linear character). Note 
that a character y is irreducible, and the complex values x(g) must have modulus 1 because 
lx(g")| = |x(g)|* for all integers k. For example, the group Zz = {0,1} has two characters: the x 
that maps both elements to 1, and the xy that maps 0 to 1 and 1 to —1. 

In the remainder of this section we will restrict attention to the case where G is Abelian 
(and usually finite). In the Abelian case the characters are exactly the irreducible representations 
(irreps): there are no irreps of dimension > 1. The “Basis Theorem” of group theory says that 
every finite Abelian group G is isomorphic to a direct product Zy, X --- x Zy, of cyclic groups. 
First consider just one cyclic group Zy, written additively. Consider the discrete Fourier transform 
(Chapter 4), which is an N x N matrix. Ignoring the normalizing factor of 1/V N, its k-th column 
may be viewed as a map xz : Zy — C defined by yz(j) = wit where wy = e27/N_ Note that 
XklÍ +J) = xe (J) xe (7), so Xk is actually a 1-dimensional representation (i.e., a character) of Zy. In 
fact, the N characters corresponding to the N columns of the Fourier matrix are all the characters 
of Zy. For Abelian groups G that are (isomorphic to) a product Zy, x --- x Zy, of cyclic groups, 
the |G| = N,--- Ne characters are just the products of the characters of the individual cyclic groups 
Zn,;. Note that the characters are pairwise orthogonal. 


The set of all characters of G forms a group G with the operation of pointwise multiplication. 
This is called the dual group of G. If H < G, then the following is a subgroup of G of size |G|/| H|: 


= {xk | Xk(h) = 1 for all h € H}. 


Let us interpret the quantum Fourier transform in terms of the characters. For k € Zy, define the 
state whose amplitudes are the (normalized) values of xx: 
4 Vat N-1 


1 
Xk) = > > xe —— 


jk 
why lj) 
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With this notation, the QFT just maps the standard (computational) basis of C7 to the orthonor- 
mal basis corresponding to the characters: 


Fy : |k) + |xx)- 


As we saw in Chapter 4, this map can be implemented by an efficient quantum circuit if N is a 
power of 2. The QFT corresponding to a group G that is isomorphic to Zy, x --- x Zy, is just the 
tensor product of the QFTs for the individual cyclic groups. For example, the QFT corresponding 
to Zə is the Hadamard gate H, so the QFT corresponding to Z} is H” (which is of course very 
different from the QFT corresponding to Zon). 


6.2.2 A general algorithm for Abelian HSP 


The following is an efficient quantum algorithm for solving the HSP for some Abelian group G 
(written additively) and function f : Œ — S. This algorithm, sometimes called the “standard 
algorithm” for HSP, was first observed by Kitaev [155] (inspired by Shor’s algorithm) and worked 
out further by many, for instance Mosca and Ekert [192]. 


1. Start with |0)|0), where the two registers have dimension |G] and |S], respectively. 


2. Create a uniform superposition over G in the first register: 


Ja >I 


3. Compute f in superposition: 


TALO) 


gEG 


4. Measure the second register. This yields some value f(s) for unknown s € G. The first 
register pur to a superposition over the g with the same f-value as s (i.e., the coset 


s+HĦ): aA Is +h). 
V heH 
5. Apply the QFT corresponding to G to this state, giving ——— wip |Xs+h)- 


VFI fen 


6. Measure and output the resulting g. 


The key to understanding this algorithm is to observe that step 5 maps the uniform superposition 
over the coset s + H to a uniform superposition over the labels of H+: 


TH >, [Xs+h) = E > 9 Xs+nlg 


heH hEH gEG 


=a 9) > xalal = a ` Xs(g 


gEG heH g:Xg EH 


where the last equality follows from the orthogonality of characters of the group H (note that x, 
restricted to H is a character of H, and it’s the constant-1 character iff xg € H+): 


|H| if xg € H- 
> xnlg) = X xalh) =4 0 ifx,¢ Ht 


heH heH 
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The phases y5(g) do not affect the probabilities of the final measurement, since |x5(g)|? = 1. The 
above algorithm thus samples uniformly from the (labels of) elements of H +. Each such element 
Xg EH + gives us a constraint on H because Xg(h) = 1 for all h € H “ Generating a small number 
of such elements will give sufficient information to find the generators of H itself. Consider our 
earlier examples of Abelian HSP: 


Simon’s problem. Recall that G = Z} = {0,1}" and H = {0,s} for the HSP corresponding 
to Simon’s problem. Setting up the uniform superposition over G can be done by applying H8” 
to the initial state |0”) of the first register. The QFT corresponding to G is just H®". The 2” 
characters are Xg(x) = (—1)”9. The algorithm will uniformly sample from labels of elements of 


H+ = {xq | Xg(h) = 1 for all h € H} = {xg | g- s =O}. 


Accordingly, the algorithm samples uniformly from the g € {0,1}" such that g -s = 0 (mod 2). 
Doing this an expected O(n) times gives n — 1 linearly independent equations about s, from which 
we can find s using Gaussian elimination. 


Period-finding. For the HSP corresponding to period-finding, G = Zọ(n) and H = (r), and 
Ht = {xo | e2ribh/O(N) =1foralhe H} = {xo | br/d(N) € {0,. are 1}}. 


Accordingly, the output of the algorithm is an integer multiple b = co(N)/r of ¢(N)/r, for uniformly 
random c € {0,...,r—1}. 

Notice that the algorithm doesn’t actually know ¢(N), which creates two problems. First, of 
the 4 numbers b,c, ¢(N),r involved in the equation b = cd(N)/r we only know the measurement 
outcome b, which is not enough to compute r. Second, step 5 of the algorithm wants to do a QFT 
corresponding to the group Zay) but it doesn’t know ¢(N); and even if we knew (NV), we’ve 
only seen how to efficiently implement a QFT over Z4 when q is a power of 2. Fortunately, if we 
actually use the QFT over Z, for q a power of 2 that is roughly N ? (and in step 1 set up a uniform 
superposition over Z, instead of over G), then one can show that the above algorithm still works: 
the measurement yields an integer b that (with high probability) is close to an integer multiple of 
q/r.* This is basically just Shor’s algorithm as described in Chapter 5. 


Discrete logarithm. For the HSP corresponding to the discrete log problem, where G = Zy x Zn 
and H = ((a,1)), a small calculation shows that H+ = {x(c,-ac) | € € Zy} (see Exercise 2). Hence 
sampling from H+ yields some label (c, —ac) € G of an element of H+, from which we can compute 
the discrete logarithm a. The QFT corresponding to G is Fy ® Fy, which we don’t know how to 
implement efficiently for arbitrary N, but which we can replace by Fy ® Fy for some power-of-2 q 
chosen to be somewhat larger than N. 


In the above algorithm we assumed G is a finite Abelian group. These techniques have been much 
extended to the case of infinite groups such as G = Z and even R, to obtain efficient quantum 
algorithms for problems like Pell’s equation [128], and computing properties in number fields [59]. 


4This is a linear constraint mod N. For example, say G = Zn, X Zna, and g = (g1, g2) is the label of an element 
of H+. Then 1 = Xg(h) = oy ae” for all h = (hi, h2) € H, equivalently gıhı N2 + g2h2Nı =0 mod N. 

>There is something to be proved here, but we will skip the details. In fact one can even use a Fourier transform 
for q = O(N) instead of O(N’) [127]. Note that this also reduces the number of qubits used by Shor’s algorithm 
from roughly 3log N to roughly 2 log N. 
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6.3 General non-Abelian HSP 


6.3.1 The symmetric group and the graph isomorphism problem 


The Abelian HSP covers a number of interesting computational problems, including period-finding 
and discrete log. However, there are also some interesting computational problems that can be cast 
as an instance of HSP with a non-Abelian G. Unfortunately we do not have an efficient algorithm 
for most non-Abelian HSPs. 

A good example is the graph isomorphism (GI) problem: given two undirected n-vertex graphs 
Gı and Gz, decide whether there exists a bijection taking the vertices of G, to those of G2 that 
makes the two graphs equal. No efficient classical algorithm is known for GI, so it would be great 
if we could solve this efficiently on a quantum computer.°® 

How can we try to solve this via the HSP? Let G be the 2n-vertex graph that is the disjoint 
union of the two graphs G, and G2. Let G = Sən. Let f map T E€ San to T(G), which means that 
edge (i, 7) becomes edge (m(i),7(j)). Let H be the automorphism group Aut(G) of G, which is the 
set of all 7 € So, that map G to itself. This gives an instance of the HSP, and solving it would give 
us a generating set of H = Aut(G@). 

Assume for simplicity that each of G1 and G2 is connected. If Gj and G2 are not isomorphic, 
then the only automorphisms of G are the ones that permute vertices inside G; and inside Go: 
Aut(G) = Aut(G1) x Aut(Gz). However, if the two graphs are isomorphic, then Aut(G) will also 
contain a permutation that swaps the first n with the second n vertices. Accordingly, if we were 
able to find a generating set of the hidden subgroup H = Aut(G), then we can just check whether 
all generators are in Aut(G,) x Aut(G2) and decide graph isomorphism. 


6.3.2 Non-Abelian QFT on coset states 


One can try to design a quantum algorithm for general, non-Abelian instances of the HSP along the 
lines of the earlier standard algorithm: set up a uniform superposition over a random coset of H, 
apply the QFT corresponding to G, measure the final state, and hope that the result gives useful 
information about H. QFTs corresponding to non-Abelian G are much more complicated than in 
the Abelian case, because the irreducible representations can now have dimension d > 1. For 
completeness, let’s write down the QFT anyway. Let G denote the set of irreducible representations 
of G, and dim(p) be the dimension of a particular p € G. We can assume without loss of generality 
that the dim(p) x dim(p) matrices p(g) are unitary. The QFT corresponding to G is defined as 
follows: 


dim(p) ea es 

ijlt J> 

Ins iG] |p) ` plg)ijli, j) 
peG ij=l 


where |p) denotes a name or label of p. It can be shown that this map is unitary. In particular, 
|G| = 2 â dim(p)?, which implies that the dimensions on the left and the right are the same, 
and that the right-hand state has norm 1. In many cases this QFT can still be implemented 
with an efficient quantum circuit, including for the symmetric group G = Sən that is relevant for 
graph isomorphism [36, 190]. However, that is not enough for an efficient algorithm: the standard 
algorithm does not always yield much information about the hidden H < Sən [124, 191, 129]. 


°For a long time, the best algorithm for GI took time roughly on [33], but in a recent breakthrough Babai gave a 
O(1) 


“quasi-polynomial” algorithm, which is gilosni) time [32]. That’s not yet polynomial, but a lot faster than before. 
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There are some special cases of non-Abelian HSP that can be computed efficiently, for instance 
for normal subgroups [130], solvable groups [244, 141], and nil-2 groups [142]. 


6.3.3 Query-efficient algorithm 


While we do not have a general efficient quantum algorithm for the non-Abelian HSP, there does 
exist an algorithm that needs to compute f only a few times, i.e., a query-efficient algorithm, due 
to Ettinger et al. [108]. We will sketch this now. 

Consider steps 1-3 of the standard algorithm for the Abelian case. Even in the general non- 
Abelian case, this produces a coset state, i.e., a two-register superposition where the second register 
ranges over the values of f, and the first register will be a uniform superposition over the coset of 
H that corresponds to that value of f. Suppose we do this m times, producing a state |W) which 
is the tensor product of m coset states for the same unknown H (for simplicity, below we’ll ignore 
the fact that this state also depends on the particular values f takes on the cosets of H). One can 
show that the coset states corresponding to different possible H are pairwise almost orthogonal: 
Yapa | is exponentially small in m. How large should we take m to ensure that these states are 
“sufficiently orthogonal” to enable us to learn H from |)? The hidden subgroup H is generated 
by a set of < log |G| elements. Hence the total number of possible H that we want to distinguish 
is at most A < 2l8|G)?. This upper bound on the number of possible H allows us to define 
a POVM measurement {Ep} (see Section 1.2.2 for the definition of POVM), with one element for 
each possible hidden subgroup H, such that if we measure |) with this POVM, then we are likely 
to get the correct outcome H. Choosing m = O((log|G|)?) make the states “sufficiently orthogonal” 
for this idea to work (see Exercise 4). This POVM need not be efficiently implementable: circuits 
to implement it (using only a computational-basis measurement at the end) may require a number 
of elementary gates that’s polynomial in |G|. But at least the number of times we need to query 
the function f is only polylogarithmic in |G}. 


For those interested in more HSP results, a good source is Childs’s lecture notes [82, Chapter 4-14]. 


Exercises 


1. Show that the Deutsch-Jozsa problem for n = 1 (i.e., where f : {0,1} — {0,1}) is an instance 
of the HSP. Explicitly say what G, f, H, and H+ are, and how sampling from H+ allows you 
to solve the problem. 


2. Show that for the HSP corresponding to discrete log, we indeed have H+ = 1Xte=08 | C= Zu} 
as claimed near the end of Section 6.2.2. 


3. This exercise explains Diffie-Hellman key exchange, which is secure under the assumption 
that the adversary cannot efficiently compute discrete logarithms. Alice and Bob choose a 
public key consisting of a large prime p (say, of 1000 or 2000 bits) and generator y of the 

group Z%, which has size ¢(p) = p — 1. To agree on a shared secret key K, Alice chooses a 

uniformly random a € {0,...,p — 2} and sends Bob the group element A = 7°; Bob chooses 

a uniformly random b € {0,...,p—2} and sends Alice B = è. Alice and Bob use K = y?? as 


their secret key, which they can use for instance to encrypt messages using a one-time pad. 


(a) Show that both Alice and Bob can efficiently compute K given the communication. 
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(b) Show that an adversary who can efficiently compute discrete logarithms, can compute 
K from the public key and the communication tapped from the channel (i.e., A, B, p 
and y, but not a and b). 


4. Suppose we are given an unknown state |7);) from a known set of K states {|q;) | 7 € [A]}. 
(a) Suppose the states are pairwise orthogonal: (w;|Wz) = djn. Give a projective measure- 
ment that determines 7 with probability 1. 
(b) (H) Suppose the states are pairwise almost orthogonal: |(~;|~,)| < 1/K? for all distinct 
j,k € [K]. Define FE; = 2 lb) (wil. Show that I — ar E; is positive semidefinite. 
(c) Under the same assumption as (b), give a POVM that determines i with success prob- 


ability at least 2/3. 


5. (H) Suppose we have an efficient algorithm to produce, from a given undirected n-vertex 
graph G, the following n?-qubit state: 


ag X` |n(G)), 


TE Sn 


where the basis states correspond to n x n adjacency matrices. Here ag is a scalar that makes 
the norm equal to 1. Use this procedure to efficiently decide (with high success probability) 
whether two given graphs G; and G2 are isomorphic or not. 
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Chapter 7 


Grover’s Search Algorithm 


The second-most important quantum algorithm after Shor’s is Grover’s search algorithm [125]. It 
doesn’t provide an exponential speed-up, only a quadratic speed-up, but it is much more widely 
applicable than Shor. 


7.1 The problem 


The search problem: 

For N = 2”, we are given an arbitrary x € {0,1}. The goal is to find an i such that z; = 1 (and 
to output ‘no solutions’ if there are no such i). We denote the number of solutions in x by t (ie., 
t is the Hamming weight of x). 


This problem may be viewed as a simplification of the problem of searching an N-slot unordered 
database or search space, modeled by an N-bit string. Classically, a randomized algorithm would 
need O(N) queries to solve the search problem. Grover’s algorithm solves it in O(V N) queries, 
and O(VN log N) other gates (the number of gates can be reduced a bit further, see Exercise 9). 


7.2  Grover’s algorithm 


Let Oz li) = (—1)” |i) denote the +-type oracle for the input x (i.e., a phase-query), and Ro be 
the unitary transformation that puts a —1 in front of all basis states |i) where i 4 0", and that 
does nothing to the basis state |0”).! The Grover iterate is 


G = H?” RoH®” Og: (7.1) 


Note that 1 Grover iterate makes 1 query, and uses O(log N) other gates. 
Grover’s algorithm starts in the n-bit state |0”), applies a Hadamard transformation to each 
1 


qubit to get the uniform superposition |U} = 7 X; li) of all N indices, applies G to this state k 


times (for some k to be chosen later), and then measures the final state. Intuitively, what happens 
is that in each iteration some amplitude is moved from the indices of the 0-bits to the indices of 
the 1-bits. The algorithm stops when almost all of the amplitude is on the 1-bits, in which case a 
measurement of the final state will probably give the index of a 1-bit. Figure 7.1 illustrates this. 


'This unitary Ro is independent of x, and can be implemented using O(n) elementary gates (Exercise 2.8.a). 
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n 0) H G G tee G -— measure 


Figure 7.1: Grover’s algorithm, with k Grover iterates 


In order to analyze this, define the following “good” and “bad” states, corresponding to the 
solutions and non-solutions, respectively: 


1 . 1 ; 
|G) = A > li) and |B) = VN—t pa 


t:ag=l1 Tu= 


Then the uniform state over all indices can be written as 
, Xa 
|U) = UN > |i) = sin(@)|G) + cos(6)|B), for 6 = arcsin(,/t/N). 
i=0 


The Grover iterate G is actually the product of two reflections.” Firstly, Oy + is a reflection through 
the subspace V spanned by the basis states that are not solutions; restricted to the 2-dimensional 
space spanned by |G) and |B) this is in fact just a reflection through the state |B). Secondly, 


HS” RoH®” = He 2/0 10") — T) HS” =I 0 | — HS” IH®” = | — I 


is a reflection through |U}. 
Here is Grover’s algorithm restated, assuming we know the fraction of solutions is € = t/N: 


1. Set up the starting state |U) = H®"|0) 
2. Repeat the following k = O(1/,/e) times: 


(a) Reflect through |B) (i.e., apply Oz,+) 
(b) Reflect through |U) (i.e., apply H8” RoH®”) 


3. Measure the first register and check that the resulting 7 is a solution 


?A reflection through a subspace V is a unitary A such that Av = v for all vectors v € V, and Aw = —w for 
all w orthogonal to V. Note that we can write A = 2Py — I, where Py is the projector onto subspace V. If V is 
1-dimensional, spanned by vector u, we also call this a reflection through u. 
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Geometric argument: ‘There is a fairly simple geometric argument why the algorithm works. 
The analysis is in the 2-dimensional real plane spanned by |B) and |G). We start with 


|U} = sin(@)|G) + cos(@)|B). 


The two reflections (a) and (b) increase the angle from 0 to 30, moving us towards the good state 
as illustrated in Figure 7.2. 


|G) |G) |G) 


Figure 7.2: The first iteration of Grover: (picture on the left) start with |U); (middle) reflect 
through |B) to get O,,4|U); (right) reflect through |U} to get G|U) 


The next two reflections (a) and (b) increase the angle with another 20, etc. More generally, 
after k applications of (a) and (b) our state has become 


sin((2k + 1)@)|G) + cos((2k + 1)6)|B). 
If we now measure, the probability of seeing a solution is P, = sin((2k + 1)6)?. We want Py, to be 
as close to 1 as possible. Note that if we can choose k = 7 — 1/2, then (2k + 1)@ = 7/2 and hence 
P; = sin(/2)? = 1. An example where this works is if t = N/4, for then 0 = 7/6 and k=1. 
Unfortunately k = qj — 1/2 will usually not be an integer, and we can only do an integer number 


of Grover iterations. However, if we choose k to be the integer closest to &, then our final state will 
still be close to |G) and the failure probability will still be small (assuming t < N): 


1— P, = cos((2k+1)0)? = cos((2k + 1)0 + 2(k — k)6)? 
= cos(m/2+2(k — k)0)? = sin(2(k — k)0)? < sin(6)? = 


where we used |k — k| < 1/2. Since arcsin(@) > 0, the number of queries is k < B<4,/%. 
Algebraic argument: For those who don’t like geometry, here’s an alternative (but equivalent) 
algebraic argument. Let az, denote the amplitude of the indices of the t 1-bits after k Grover 
iterates, and bọ the amplitude of the indices of the 0-bits. Initially, for the uniform superposition 
|U) we have ap = bo = 1/VN. Using that H®"RoH®" = 2J — I, where J is the N x N all-1s 
matrix, we find the following recursion: 


N-2 | (N-t) 


QAk4+1 = N Qk + N bk 
=2t N — 2t 
bk41 = N ak + N bk 
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The following formulas, due to Boyer et al. [62], provide a closed form for ay, and by (which may be 
verified by substituting them into the recursion). With 0 = arcsin(,/t/N) as before, define 


1. 
ak = Ay aa ca 
ne a 008 (2h + 1)0) 


Accordingly, after k iterations the success probability (the sum of squares of the amplitudes of the 
locations of the t 1-bits) is the same as in the geometric analysis 


P, = t - a? = (sin((2k + 1)6))?. 


Thus we have a bounded-error quantum search algorithm with O(,/N/t) queries, assuming we 
know t. We now list (without full proofs) a number of useful variants of Grover: 


e If we know t exactly, then the algorithm can be tweaked to end up in exactly the good state. 


Roughly speaking, you can make the angle @ slightly smaller, such that k = 7/46 — 1/2 
becomes an integer (see Exercise 5). 


e If we do not know t, then there is a problem: we do not know which k to use, so we do 
not know when to stop doing the Grover iterates. Note that if k gets too big, the success 
probability P, = (sin((2k + 1)@))? goes down again! However, a slightly more complicated 
algorithm due to [62] (basically running the above algorithm with exponentially increasing 
guesses for k) shows that an expected number of O(,/N/t) queries still suffices to find a 
solution if there are t solutions. If there is no solution (t = 0), then we can easily detect that 
by checking x; for the i that the algorithm outputs. 


e If we know a lower bound 7 on the actual (possibly unknown) number of solutions t, then the 
above algorithm uses an expected number of O(,/ N/T) queries. If we run this algorithm for up 
to three times its expected number of queries, then (by Markov’s inequality) with probability 
at least 2/3 it will have found a solution. This way we can turn an expected runtime into a 
worst-case runtime. 


e If we do not know t but would like to reduce the probability of not finding a solution to some 
small € > 0, then we can do this using O(,/N log(1/e)) queries (see Exercise 6). 
NB: The important part here is that the log(1/e) is inside the square-root; usual error- 
reduction by O(log(1/e)) repetitions of basic Grover would give the worse upper bound of 
O(VN log(1/e)) queries. 


7.3 Amplitude amplification 


The analysis that worked for Grover’s algorithm is actually much more generally applicable (we 
will also see it again in the next chapter). In this section we describe a very similar procedure that 
allows us to amplify the “good” part of the outcome of an algorithm. Quite abstractly, suppose we 
have a quantum circuit A (without measurements) that acts on m qubits, such that 


A\O") = vpli) + V1 — plo), 
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where |w1) and |wWo) are normalized m-qubit states that are orthogonal to each other (it could for 
instance be that the last qubit of |71) is |1) and the last qubit of |wWo) is |0)). For some reason we 
like the state |1) and we want to increase its “weight” \/p in the superposition. The following 
procedure achieves this. 

In analogy with the analysis of Grover, think of |1) as the “good state” and |o) as the “bad 
state,” and view these two states as the vertical and horizontal axes in a 2-dimensional picture. Our 
starting state will be |U} = A|0”’), which plays the role that the uniform state played in Grover, 
and which is of course easy to obtain by applying A once to basis state |0’"). The angle between 
|U} and the horizontal axis is 0 = arcsin \/p. We would like to rotate this initial state towards the 
good state, i.e., towards the vertical axis. As for Grover, we could implement the desired rotation 
as a product of two reflections: a reflection through the bad state and a reflection through |U). 

For the first reflection, suppose we have a circuit Rg that can somehow distinguish the good 
state from the bad state by putting a “—” in front of |1) and leaving |yWo) alone. For example, if 
the last qubit of |W) is |1) and the last qubit of |Yo) is |0), then Rg would be extremely easy: it 
would just apply a Z-gate to the last qubit. The second reflection can be implemented as 


ARjA™. 


It is easy to check that this maps the state |U} = A|0") to itself, while every state orthogonal to 
|U} gets a “—” in front of it, so indeed this reflects through |U}. Like before, the product of these 
two reflections corresponds to a rotation by an angle of 20 in the 2-dimensional picture. 

The following amplitude amplification procedure from [66] increases the amplitude of the good 
state to be close to 1: 


1. Setup the starting state |U) = A|0™) 
2. Repeat the following O(1/,/p) times: 


(a) Reflect through the bad state |o) (i-e., apply Ra) 
(b) Reflect through |U} (i.e., apply ARoA~*) 


The analysis is the same as for Grover: the initial angle between our algorithm’s state and the 
horizontal axis is 0 and every iteration increases this angle by 20, so after k iterations our state is 


sin((2k + 1)0)|1) + cos((2k + 1)0) lyo). 


We would like to end up with angle (2k + 1)@ = 2/2, because then the amplitude of the good state 
|1) would be close to sin(7/2) = 1. Hence, like before, we choose k to be 7 — 1/2 rounded to the 
nearest integer. This is O(1/,/p). If we do not know in advance what p is, then we can try out 
exponentially decreasing guesses for its value, similar to how we handle the case of Grover with 
unknown number of solutions. 

Note that the Hadamard transform H®" can be viewed as an algorithm with success probability 
p=t/N for a search problem of size N with t solutions, because H®"|0") is the uniform superpo- 
sition over all N locations. Hence Grover’s algorithm is a special case of amplitude amplification, 
where m = n, A = A~! = H®”, and Rg corresponds to a phase-query to x. 

Amplitude amplification allows to speed up a very large class of classical algorithms: any 
algorithm A that has some non-trivial probability p of finding a solution to whatever problem we’re 
trying to solve, can be amplified to success probability nearly 1 by O(1/,/p) runs of A and An 
provided we can efficiently “recognize” solutions, i.e., implement Rg. In contrast, classically we 
would need to repeat A O(1/p) times before we have success probability close to 1. 
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7.4 Application: satisfiability 


Grover’s algorithm has many applications: basically any classical algorithm that has some search- 
component can be improved using Grover’s algorithm as a subroutine. This includes many basic 
computer applications such as finding shortest paths and minimum spanning trees, various other 
graph algorithms, etc. 

We can also use it to speed up the computation of NP-complete problems (see Chapter 13 for 
the complexity class NP), albeit only quadratically, not exponentially. As an example, consider 
the satisfiability problem: we are given a Boolean formula $(i1,...,i,) and want to know if it has 
a satisfying assignment, i.e., a setting of the bits 71,...,7, that makes $(71,...,i,) = 1. A classical 
brute-force search along all 2” possible assignments takes time roughly 2”. 

To find a satisfying assignment faster, define the N = 2”-bit input to Grover’s algorithm by 
x; = (i), where i € {0,1}”. For a given assignment i = 71...i, it is easy to compute ¢(i) 
classically in polynomial time. We can write that computation as a reversible circuit (using only 
Toffoli gates), corresponding to a unitary Uy that maps |i,0,0) + |i, (i), wi), where the third 
register holds some classical workspace the computation may have needed. To apply Grover we 
need an oracle that puts the answer in the phase and doesn’t leave workspace around (as that 
could mess up the interference effects, see Exercise 2.10 for an example). Define Oy as the unitary 
that first applies Uy, then applies a Z-gate to the second register, and then applies Us? to “clean 
up” the workspace again. This has the form we need for Grover: O;4|i) = (—1)*‘|t); here we did 
not explicitly write the workspace qubits, which start and end in |0). Now we can run Grover and 
find a satisfying assignment with high probability if there is one, using a number of elementary 
operations that is v2” times some polynomial factor. 

If brute-force search is basically the best thing we can do classically to solve some particular NP- 
hard problem, then that computation can be sped up quadratically on a quantum computer using 
Grover search like above. However, there are also NP-hard problems where we know algorithms 
that still run in exponential time, but that are much faster than brute-force search. For example, 
consider the famous Traveling Salesman Problem (TSP): given an n-vertex graph with weights 
(distances) on the edges, find the shortest tour in this graph that visits every node exactly once. 
Since there are (n — 1)! many different tours, classical brute-force search would take time (n — 1)!, 
times some polynomial in n. Grover’s algorithm could speed this up quadratically. However, there 
are much more clever classical algorithms for TSP. In particular, the Bellman-Held-Karp dynamic 
programming algorithm solves TSP in time 2”, times a polynomial in n. This algorithm is much 
faster than O(Vn!) (which is roughly (n/e)"/?), and is not amenable to a straightforward speed- 
up using Grover. Nevertheless, it turns out quantum computers can still solve TSP polynomially 
faster than the best known classical algorithms, albeit in a much more complicated way than by 
just applying Grover [16]. 


Exercises 


1. (a) Suppose n = 2, and z = 200%01%10%11 = 0001. Give the specific initial state, three 
intermediate states, and final state in Grover’s algorithm, for k = 1 iterations (using 
the decomposition of one Grover iterate into a product of four unitaries from Eq. (7.1)). 
What is the success probability? 


(b) Give the final state after k = 2 iterations. What is now the success probability? 
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2. 


3. 


(a) Suppose you have a quantum algorithm for some computational problem that takes 
VN operations on inputs of size N, each operation of constant cost C. And the best- 
possible classical algorithm for the same computational problem takes N operations, 
each of constant cost c. Suppose C is much larger than c (which is certainly the case in 
the current state of quantum technology: doing one elementary quantum gate is much 
more expensive than one doing classical logic gate). How large does the input-size N 
have to be before the quantum algorithm has lower cost than the best-possible classical 
algorithm? 


(b) Suppose you have a quantum algorithm of cost C\/2” for satisfiability of n-variable 
Boolean formulas, where the best-possible classical algorithm has cost c/2”, and again 
C is much larger than c. How large does n have to be before the quantum algorithm has 
lower cost than the best-possible classical algorithm? 


Show that if the number of solutions is t = N/4, then Grover’s algorithm always finds a 
solution with certainty after just one query. How many queries would a classical algorithm 
need to find a solution with certainty if t = N/4? And if we allow the classical algorithm 
error probability 1/10? 


. Suppose we have a string of N = 2” bits, containing t ones (solutions) and N —t zeroes. You 


may assume you know the number t. 


(a) Show that we can use Grover’s algorithm to find the positions of all t ones, using an 
expected number of O(tV N) queries. You can argue on a high level, no need to draw 
actual quantum circuits. 


(b) (H) Show that this can be improved to an expected number of O(vtN) queries. 


. At the end of Section 7.2 we claimed without proof that Grover’s algorithm can be tweaked to 


work with probability 1 if we know the number of solutions exactly. For N = 2”, this question 
asks you to provide such an exact algorithm for an x € {0,1} with a unique solution (so we 
are promised that there is exactly one i € {0,1}" with x; = 1, and our goal is to find this i). 


(a) Give the success probability of the basic version of Grover’s algorithm after k iterations. 


T 1; : 
Zarcsim(1/VN) 2 is not an integer. Show 


that if we round ķ up to the nearest integer, doing [k] iterations, then the algorithm 
will have success probability strictly less than 1. 


(b) Suppose the optimal number of iterations k = 


(c) Define a new 2N-bit string y € {0,1}2%, indexed by (n+ 1)-bit strings j = j1... jnjn4i 
by setting 
Y; = t. 3f Lji.jn = l and Int = 0, 
1 0 otherwise. 


Show how you can implement the following (n + 1)-qubit unitary 
Sy: |J) > (~1)” |), 


using one query to x (of the usual form O, : |i,b) +> |i, b ® x;)) and a few elementary 
gates. 
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cosy —siny 

siny cosy 
Let A = H?” & U, be an (n + 1)-qubit unitary. What is the probability (as a function 
of y) that measuring the state A|0"+!) in the computational basis gives a solution 
j € {0,1}"*1 for y (i.e., such that yj = 1)? 

(e) (H) Give a quantum algorithm that finds the unique solution in string x with probabil- 
ity 1 using O(V N) queries to z. 


(d) Let y € [0,27) and let Uy = ( ) be the corresponding rotation matrix. 


6. Given query access to x € {0,1}, with unknown Hamming weight t = |z|, we want to find 
a solution, i.e., an index i € {0,...,N — 1} such that z; = 1. If £ = 0% then our search 
algorithm should output “no solution.” 


(a) (H) Suppose we know an integer s such that t € {1,...,s}. Give a quantum algorithm 
that finds a solution with probability 1, using O(v sN) queries to z. 

(b) Suppose we know that t € {s + 1,...,N}. Give a quantum algorithm that finds a 
solution with probability at least 1 — 2~*, using O(v sN) queries to z. 


(c) For given € > 27N, give a quantum algorithm that solves the search problem with 
probability > 1 — £ using O(,/N log(1/e)) queries, without assuming anything about t. 


7. (H) Here we will approximately count the number of 1s in a string x € {0,1}%. Let t = |z| 
denote that (unknown) number. 


(a) Given an integer m € {1,..., N}, describe a quantum algorithm that makes O(,/ N/m) 
queries to x and decides between the cases t < m/2 and t € [m,2m] with probability 
at least 2/3. That is, the algorithm has to output 0 with probability > 2/3 whenever 
t < m/2, has to output 1 with probability > 2/3 whenever t € [m, 2m], and can output 
whatever it wants for other values of t. 

(b) Give a quantum algorithm that uses O(VN log log N) queries to x and that outputs an 
integer m such that, with probability > 2/3, the unknown t lies between m/2 and 2m. 


8. Suppose we have a quantum circuit A acting on m qubits, such that 
Al0™) = Va|d1)|1) + V1 — al¢o)|0), 


where |ġ1) and |o) are arbitrary normalized (m — 1)-qubit states, and a € [0,1/4] is an 
unknown number. Our goal is to estimate a (this is known as amplitude estimation). Let S 
be the 2-dimensional subspace spanned by |¢)|1) and |¢o)|0). 


(a) Show that in S, the unitary J & Z (where I is the identity on m — 1 qubits and Z is the 
phase-flip gate) is a reflection through |¢o)|0). 


(b) Let Ro = 2/0”)(0™| — I be a reflection through |0’’). Show that in S, the unitary 
AR)A™! is a reflection through A|0"). 


(c) Show that in S, the unitary U = ARjA~!- (I & Z) is a rotation over angle 20, where 
6 = arcsin ya. 


(d) (H) Given some € € (0, 1/2), show how you can use phase estimation (Section 4.6) with 
O(1/e) applications of U to find an approximation @ of a such that |\Va@— val < e. 
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(e) (H) Suppose we have query-access to a string x € {0,1} of unknown Hamming weight 
t = |z|. Use (d) to compute an integer t such that |t — t| < VN with success probability 
> 2/3, using O(v N) queries to x. 


(f) Suppose x € {0,1}% has |z| € {0,1}. Use (d) to compute |z| with success probability 
> 2/3, using O(V N) queries to x (you may not invoke Grover here). 


9. Suppose you are given query access to x € {0,1}, where |z| = 1 and N = 2”. You want to 
find the unique index i = i,_1...i9 E {0,1}”" such that z; = 1. 


(a) Let k € {1,...,n—1}. Fix the first n — k bits of the n-bit index i,_1...%9 to specific 
values 7;,_,...%,,. Give a quantum algorithm to find a solution with success probability 1 
(if one exists) among the 2* indices i € {0,1}” that start with *_,...7%, using O(V2*) 
queries and O(V2k k) other gates. 


(b) Give a quantum algorithm that solves the search problem on x using O(N) queries 
and O(VN log log N) other gates. 
Comment: The O(N log log N) gates achieved here is better than standard Grover, which uses O(V N log N) 
gates. The loglog N can be reduced a bit further, to nearly-constant [28]. 


10. (H) Let z = zo ...xzy—1 be a sequence of distinct real numbers, where N = 2”, and each x; 
can be written exactly using b bits. We can query these in the usual way, i.e., we can apply 
(n+b)-qubit unitary Ox : |i,0°) > |i, zi), as well as its inverse. The minimum of x is defined 
as min{x; | 7 € {0,...,N —1}}. Give a quantum algorithm that finds (with probability 
> 2/3) an index achieving the minimum, using at most O(VN log N) queries to the input, 
and prove that this algorithm works. 


Bonus: give a quantum algorithm that uses O(V N) queries. 


11. Let £x = xp...x%N_1, where N = 2” and a; € {0,1}", be an input that we can query in the 
usual way. We are promised that this input is 2-to-1: for each i there is exactly one other j 
such that x; = xj.” Such an (i, j)-pair is called a collision. 


(a) Suppose S' is a uniformly randomly chosen set of s < N/2 elements of {0,...,N — 1}. 
What is the probability that there exists a collision in S? 


(b) (H) Give a classical randomized algorithm that finds a collision (with probability > 2/3) 
using O(v N) queries to z. 


(c) (H) Give a quantum algorithm that finds a collision (with probability > 2/3) using 
O(N*/8) queries. 


12. Consider an undirected graph G = (V, E), with vertex set V = {1,...,n} and edge-set E. 
We say G is connected if, for every pair of vertices i,j € V, there is a path between i and 
j in the graph. The adjacency matrix of G is the n x n Boolean matrix M where Mij = 1 
iff (i,j) € E (note that M is a symmetric matrix because G is undirected). Suppose we are 
given input graph G in the form of a unitary that allows us to query whether an edge (i, j) 
is present in G or not: 
Om : |i, j, b) > |t,7,b® Mi). 


3The 2-to-1 inputs for Simon’s algorithm are a very special case of this, where the collisions are determined by a 
secret string s € {0,1}": x; equals x; if i = j @s. 
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(a) 


Assume G is connected. Suppose we have a set A of edges which we already know to be 
in the graph (so A C E; you can think of A as given classically, you don’t have to query 
it). Let G4 = (V, A) be the subgraph induced by only these edges, and suppose G4 is 
not connected, so it consists of c > 1 connected components. Call an edge (i,j) € E 
“sood” if it connects two of these components. Give a quantum algorithm that finds a 
good edge with an expected number of O(n//c — 1) queries to M. 


Give a quantum algorithm that uses at most O(n°/?) queries to M and decides (with 
success probability at least 2/3) whether G is connected or not. 


Show that classical algorithms for deciding (with success probability at least 2/3) whether 
G is connected, need to make Q(n”) queries to M. 
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Chapter 8 


Quantum Walk Algorithms 


8.1 Classical random walks 


Consider an undirected graph G with N vertices. Suppose at least an ¢-fraction of the vertices are 
“marked,” and we would like to find a marked vertex. One way to do this is with a random walk: 


Start at some specific vertex y of the graph. 
Repeat the following a number of times: Check if y is marked, and if not then choose 
one of its neighbors at random and set y to be that neighbor. 


This may seem like a stupid algorithm, but it has certain advantages. For instance, it only needs 
space O(log N), because you only need to keep track of the current vertex y, and maybe a counter 
that keeps track of how many steps you’ve already taken.! Such an algorithm can for example 
decide whether there is a path from a specific vertex y to a specific vertex x using O(log N) space. 
We'd start the walk at y and only x would be marked; one can show that if there exists a path 
from y to x in G, then we will reach x in poly(V) steps. 

Let us restrict attention to d-regular graphs without self-loops, so each vertex has exactly d 
neighbors. A random walk on such a graph G corresponds to an N x N symmetric matrix P, 
where Pry = 1/d if (x,y) is an edge in G, and Pyy = 0 otherwise. This P is the normalized 
adjacency matrix of G. If v € R is a vector with a 1 at position y and Os elsewhere, then Pv is 
a vector whose z-th entry is (Pv), = 1/d if (x,y) is an edge, and (Pv), = 0 otherwise. In other 
words, Pv is the uniform probability distribution over the neighbors of y, which is what you get by 
taking one step of the random walk starting at y. More generally, if v is a probability distribution 
on the vertices, then Pv is the new probability distribution on vertices after taking one step of the 
random walk, and P*v is the probability distribution after taking k steps. 

Suppose we start with some probability-distribution vector v (which may or may not be con- 
centrated at one vertex y). We will assume G is connected and not bipartite. Then P*v will 
converge to the uniform distribution over all vertices, and the speed of convergence is determined 
by the “gap” between the first eigenvalue of P and all other eigenvalues. This can be seen as 
follows. Let Ay > Ag > --- > Ay be the eigenvalues of P, ordered by size, and vj,...,un be 
corresponding orthogonal eigenvectors.? The largest eigenvalue is \y = 1, and corresponds to the 


"Here we’re assuming the neighbors of a given vertex are efficiently computable, so you don’t actually need to 
keep the whole graph in memory. This will be true for all graphs we consider here. 

? Analyzing graphs by looking at the eigenvalues and eigenvectors of their adjacency matrix is called “algebraic 
graph theory” or “spectral graph theory,” see for instance [70]. 
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eigenvector vı = u = (1/N) which is the uniform distribution over all vertices. One can show that 
our assumption that G is connected implies Ag < 1, and our assumption that G is not bipartite 
implies Ay > —1. Hence all eigenvalues A; for i € {2,..., N} will be in (—1, 1); the corresponding 
eigenvector vi will be orthogonal to the uniform vector u, so the sum of its entries is 0. Let ô > 0 
be the difference between A; = 1 and max;>2 |A;| (hence |A;| < 1—6 for all i > 2). This 6 is called 
the “spectral gap” of the graph. 

Now decompose the starting distribution v as v = yy a,v;. Since the sum of v’s entries is 1, 
and the sum of v;’s entries is 1, while each other eigenvector v; (i > 2) has entries summing to 0, it 
follows that a, = 1. Now let us see what happens if we apply the random walk for k steps, starting 


from v: 
PFy = P* (= on) = iD a rFv; =ut X a; vi 
i i 


i>2 


Consider the squared norm of the difference between P*v and u: 


2 
2 
Pie -ull = E atol] = So eP < Pa - 8). 
i>2 i>2 


Since v is a probability distribution, we have |u|? < 1. By choosing k = In(1/n)/6, we get 
|| P¥u = u|| <7. In particular, if ô is not too small, then we get quick convergence of the random 
walk to the uniform distribution u, no matter which distribution v we started with.? Once we are 
close to the uniform distribution, we have probability roughly £ of hitting a marked vertex. Of 
course, the same happens if we just pick a vertex uniformly at random, but that may not always 
be an option if the graph is given implicitly. 

Suppose it costs S to set up an initial state v; it costs U to update the current vertex, i.e., to 
perform one step of the random walk; and it costs C to check whether a given vertex is marked. 
“Cost” is left undefined for now, but typically it will count number of queries to some input, or 
number of elementary operations. Consider a classical search algorithm that starts at v, and then 
repeats the following until it finds a marked vertex: check if the current vertex is marked, and if 
not run a random walk for roughly 1/8 steps to get close to the uniform distribution. Ignoring 
constant factors, the expected cost before this procedure finds a marked item, is on the order of 


s+i(c+v). (8.1) 


8.2 Quantum walks 


We will now modify the classical random walk algorithm preceding Eq. (8.1) to a quantum algo- 
rithm, where the distribution-preserving matrix P is changed to a norm-preserving matrix W(P) 
(i.e., a unitary). This is due to Magniez et al. [182], inspired by Szegedy [233]; our presentation is 
mostly based on Santha’s survey paper [216], to which we refer for more details and references. 
While the basis state of a classical random walk is the current vertex we are at, a basis state 
of a quantum walk has two registers, the first corresponding to the current vertex and the second 


3Convergence in total variation distance can be derived from this by Cauchy-Schwarz, choosing n < 1/VN. 
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corresponding to the previous vertex. Equivalently, a basis state of a quantum walk corresponds 
to an edge of the graph. 

Our resulting quantum walk algorithm for search will actually be quite analogous to Grover’s 
algorithm. We’ll call a basis state |x)|y) “good” if x is a marked vertex, and “bad” otherwise. 
Define |pz) = diy V/Pryly) to be the uniform superposition over the neighbors of z. As for Grover, 
define “good” and “bad” states as the superpositions over i and bad basis states: 


|G) = |z)|p2) and |B) = TN ol |z) |px); 
m vam 2 TA 2" 


where M denotes the set of marked vertices. Note that |G) is just the uniform superposition over 
all edges (x, y) where the first coordinate is marked, and |B) is just the uniform superposition over 
all edges (x, y) where the first coordinate is not marked. 

Ife =|M|/N and 98 := arcsin( y£) then the uniform state over all edges can be written as 


U) = x S- læ)lpa) = sin(6)|G) + cos(8)|B). 


Here is the algorithm for searching a marked vertex if an ¢-fraction is marked?: 
1. Setup the starting state |U) 
2. Repeat the following O(1/,/e) times: 


(a) Reflect through |B) 
(b) Reflect through |U) 


3. Measure the first register and check that the resulting vertex x is marked. 


The description and analysis of this algorithm takes places in the 2-dimensional space spanned 
by |G) and |B). We’ll explain in a moment how to implement (a) and (b). Assuming we know 
how to do that, the proof that this algorithm finds a marked vertex is the same as for Grover and 
for amplitude amplification (Chapter 7). We start with |U} = sin(@)|G) + cos(@)|B). The two 
reflections (a) and (b) increase the angle from 0 to 30, moving us towards the good state (similarly 
to the analysis for Grover, you can draw a 2-dimensional picture with axes |B) and |G) to see this). 
More generally, after k applications of (a) and (b) our state has become 


sin((2k + 1)8)|G) + cos((2k + 1)8)|B). 


Choosing k © 7 = O(1/,/é), we will have sin((2k + 1)@) ~ 1, at which point measuring the first 
register will probably yield a vertex x that is marked. We now look more closely how to implement 
the two kinds of reflections. 


(a) Reflect through |B). Reflecting through |B) is relatively straightforward: we just have 
to “recognize” whether the first register contains a marked x, and put a —1 if so (note that this 
is really a reflection through the subspace spanned by the bad basis states, but restricted to the 
2-dimensional subspace spanned by |G) and |B) that’s the same as a reflection through |B)). 


4 As in Grover, if we don’t know e then we just run the algorithm repeatedly with exponentially decreasing guesses 
for e (1/2, 1/4, 1/8, ...). If at the end we still haven’t found a marked item, we’ll conclude that probably none exists. 
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(b) Reflect through |U). This is where the quantum walk comes in. Let A be the subspace 
span{|x)|pz)} and B be span{|p,)|y)}. Let ref(A) and ref(B) denote reflections through A and B, 
respectively. Define W(P) = ref(B)ref(A) to be the product of these two reflections. This is the 
unitary analogue of P. Suppose we are able to implement the following two operations (even in a 
controlled manner): 


(1) |2)|0) => |) |pe) 
(2) |0)|y) => |py)|y) 


Since (1) and (2) prepare a uniform superposition over the neighbors of x and y, respectively, one 
can think of them as taking one classical walk step “in superposition.” Note that ref(A) can be 
implemented by applying the inverse of (1), putting a minus if the second register is not |0), and 
applying (1). We can similarly implement ref(G) using (2) and its inverse. Hence we can think of 
W(P) = ref(B)ref(.A) as corresponding to four steps of the classical walk in superposition. 

To see how to implement the reflection through |U), let us consider the eigenvalues and eigenvec- 
tors of W (P). The eigenvalues of W (P) can be related to the eigenvalues \j, A2,... of P as follows. 
Let 0; € [0,7/2] be such that |A;| = cos(@;). We won’t prove it here, but it turns out that the 
eigenvalues of W (P) are of the form e*+?’*;. The state |U} is an eigenvalue-1 eigenvector of W (P), 
corresponding to 6; = 0. The spectral gap of P is ô, so all eigenvectors of W(P) that do not have 
eigenvalue 1, have eigenvalue e+?) where 0; > v28, because 1 — 6 > |A;| = cos(0;) > 1— 07/2. 
We now want to implement a reflection R(P) through the subspace spanned by the eigenvalue-1 
eigenvectors of W(P); restricted to the 2-dimensional subspace spanned by |G) and |B} this will 
be the desired reflection through |U}. 

We will implement R(P) by using phase estimation (see Section 4.6) with precision V/5/2 on 
W(P) to distinguish the eigenvalue-1 eigenvectors from the other eigenvectors. This precision 
requires O(1//6) applications of W (P), and O(log(1/6)) auxiliary qubits that start in |0} and where 
the estimate will be stored. Assume for simplicity that phase estimation always gives an estimate 
OF of 0; that is within precision V65/2.° Because the nonzero 0; are at least 26, approximating 
them within additive error 6/2 is good enough to determine whether the actual value 6; itself is 0 
or not. We then multiply the state with a —1 if the estimate is sufficiently far from 0, and finally 
reverse the phase estimation to put the auxiliary qubits back to 0. Applied to some eigenvector 
|w) of W(P) with corresponding eigenvalue e+?"%;, R(P) maps 


PE 


R(P) : |w)|0) EB pwls) + (11y) PB 


(—1)!17°l|w) |0). 


This has the desired effect: ignoring the auxiliary qubits (which start and end in 0), R(P) maps 
eigenvalue-1 eigenvectors of W(P) to themselves, and puts a —1 in front of the other eigenvectors. 


Now that we know how to implement the algorithm, let us look at its complexity. Consider the 
following setup, update, and checking costs: 


e Setup cost S: the cost of constructing |U} 


e Checking cost C: the cost of the unitary map |x)|y) > mz|x)|y), where Mme = —1 if x is 
marked, and Mg = 1 otherwise 


°Phase estimation will actually give a superposition over estimates 6;, with small but nonzero amplitudes on bad 
estimates, but we’ll skip the technical details that are needed to deal with this. 
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e Update cost U: the cost of one step of the quantum walk, i.e., of W(P) 


The cost of part (a) of the algorithm is C. Since R(P) uses O(1/V5) applications of W(P), and 
a few other gates, the cost of part (b) of the algorithm is essentially O(U/V5). Ignoring constant 
factors, the total cost of the algorithm is then 


s+ (c+ uv). (8.2) 


Compare this with the classical cost of Eq. (8.1): quantum search square-roots both € and ô. 


8.3 Applications 


There are a number of interesting quantum walk algorithms that beat the best classical algorithms. 
We'll give three examples here. More can be found in [216]. 


8.3.1 Grover search 


Let us first derive a quantum algorithm for search. Suppose we have an N-bit string x of weight t, 
and we know t/N > e. Consider the complete graph G on N vertices. Then the matrix P for the 
random walk on G has Os on its diagonal, and its off-diagonal entries are all equal to 1/(N—1). This 
can be written as P = wo — vol , where J is the all-1 matrix and I is the identity. The only 
nonzero eigenvalues of J is N, and adding a multiple of J just shifts the eigenvalues of a matrix, 
hence the largest eigenvalue of P is Ay = N/(N —1)—1/(N — 1) = 1 (corresponding to the uniform 
vector) and all its other eigenvalues are —1/(N — 1). Note that the spectral gap ô is very large 
here: 6 = 1 — 1/(N — 1) ~ 1. We'll mark a vertex i iff x; = 1. Then, measuring cost by number 
of queries, a quantum walk on G will have S = U = 0 and C = 1. Plugging this into Eq. (8.2), 
the quantum walk will find a marked vertex (with high probability) using O(1/,/e) queries. The 
worst case is € = 1/N, in which case we'll use O(V/N) queries. Not surprisingly, we’ve essentially 
rederived Grover’s algorithm. 


8.3.2 Collision problem 


Consider the following collision problem: 


Input: £ = 2,...,%p—1, where each z; is an integer.® 
Goal: find distinct 7 and j such that x; = x; if these exist, otherwise output “all elements 
are distinct.” 


The decision version of this problem (deciding if there exists at least one collision) is also known 
as element distinctness. 

Consider the graph whose vertices correspond to the sets R C {0,...,n—1} of r elements. The 
total number of vertices is N = oh We’ll put an edge between the vertices for R and R’ iff these 
two sets differ in exactly two elements; in other words, you can get from R to R’ by removing one 
element 7 from R and replacing it by a new element j. The resulting graph J(n,r) is known as 


the Johnson graph. It is r(n — r)-regular, since every R has r(n — r) different neighbors R’. Its 


Say, all x; < n? to avoid having to use too much space to store these numbers. 
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n 


spectral gap is known to be 6 = IE [70, Sec. 12.3.2]; we won’t prove that here, just note that if 
r<n, then 6 ~ 1/r. For each set R we also keep track of the corresponding sequence of x-values, 
£R = (xi)ier. Hence the full “name” of a vertex is the pair (R, £p). 

We'll call a vertex in J(n,r) marked if it contains a collision, i.e., the corresponding set R 
contains distinct i,j such that x; = xj. In the worst case there is exactly one colliding pair i, j 
(more collisions only make the problem easier). The probability that i and j are both in a random 
r-set R, ise = +4. Hence the fraction of marked vertices is at least € ~ (r/n)?. 

We will now determine the setup, checking, and update costs. The setup cost (measured in 
terms of queries) is S = r +1: we have to create a uniform superposition |U} over all edges R, R’, 
and for each such basis state query all r + 1 elements of RU R’ to add the information zp and 
xp. Checking whether a given vertex R, £p contains a collision doesn’t take any queries because 
we already have xr, hence C = 0. To determine the update cost, note that mapping the second 
register of |R,xp)|0) to a superposition of all neighbors R’, £p requires querying (in superposition 
for all neighbors R’) the value xj of the element j that was added to get R’. Hence U = O(1). 
Plugging this into Eq. (8.2), the cost of a quantum walk algorithm for collision-finding is 


S+ as (c+ <u) = O(r+n/yr). 


vE võ 
This cost is O(n?/3) if we choose to set r = n?/3 (rounded to an integer). What the quantum walk 
produces at the end is a superposition where, if we measure the first register, with high probability 
we'll see a marked vertex. That way we obtain a set R that contains a collision; and because R is 
small, we can now cheaply find the colliding indices i, j € R. 

This query complexity O(n?/ 3) turns out to be the optimal quantum query complexity for the 
collision problem [4]. By some more work involving efficient data structures, using a quantum- 
accessible classical RAM, the time complexity (= total number of elementary quantum gates plus 
total number of queries) can be brought down to n?/? (log n)?® [14]. 


8.3.3 Finding a triangle in a graph 
Consider the following triangle-finding problem: 


Input: the adjacency matrix of a graph H on n vertices. 
Goal: find vertices u,v, w that form a triangle (i.e., (u, v), (v, w), (w, u) are all edges in 
the graph), if they exist. 


We’ll assume we have query access to the entries of the adjacency matrix of H, which tells us 
whether (u,v) is an edge or not. There are (3) bits in this oracle, one for each potential edge 
of H. It is not hard to see that a classical algorithm needs Q(n?) queries before it can decide 
with good probability whether a graph contains a triangle or not. For example, take a bipartite 
graph consisting of 2 sets of n/2 vertices each, such that any pair of vertices from different sets is 
connected by an edge. Such a graph is triangle-free, but adding any one edge will create a triangle. 
A classical algorithm would have to query all those edges separately. 

Let us try a quantum walk approach. Again consider the Johnson graph J(n,r). Each vertex 
will correspond to a set R C {0,...,2 — 1} of r vertices, annotated with the result of querying 
all possible (5) edges having both endpoints in R. We will call the vertex for set R marked if it 
contains one edge of a triangle. If there is at least one triangle in the graph, then the fraction of 
marked vertices is at least € % (r/n)?. 
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The setup cost will be S = Ce for an edge (R, R’) of the Johnson graph we query the (14) 
possible edges induced by the r + 1 H-vertices of RU R’. The update cost will be U = 2r — 2, 
because if we remove one vertex į from R then we have to remove information about r — 1 edges 
in H, and if we add a new j to R we have to query r — 1 new edges in H. 

Getting a good upper bound for the checking cost C requires some more work—namely Grover 
search plus another quantum walk! Suppose we are given a set R of r vertices. How do we decide 
whether R contains an edge of a triangle? If we can efficiently decide, for a given u and R, whether 
R contains vertices v, w such that u,v, w form a triangle in H, then we could combine this with a 
Grover search over all n possible vertices u of H. Given u and R, let us design a subroutine based on 
another quantum walk, this time on the Johnson graph J(r, r?/3). Each vertex of this Johnson graph 
corresponds to a subset R’ C R of r' = r?/3 vertices. Its spectral gap is 6! = r/r'(r — r’) ~ 1/r?/. 
We’ll mark R’ if it contains vertices v, w such that u,v,w form a triangle. If there is at least one 
triangle involving u and some v,w € R, then the fraction of marked vertices R’ in J(r,r?/3) is at 
least e! ~ (r'/r)? = 1/r?/3. For this subroutine, the setup cost is O(r?/3) (for each v € R, query 
whether (u,v) is an edge in H); the update cost is O(1) (if we replace v in R by w, then we need 
to “unquery” edge (u,v) and query edge (u,w)); and the checking cost is 0. Plugging this into 
Eq. (8.2), we can decide whether a fixed u forms a triangle with two vertices in R’, using O(r?/*) 
queries. Let’s ignore the small error probability of the latter subroutine (it can be dealt with, but 
that’s technical). Then we can combine it with Grover search over all n vertices u to get checking 
cost C = O(./nr?/). 

Plugging these S, U, and C into Eq. (8.2), the overall cost of a quantum walk algorithm for 
triangle-finding is 


i o1 _ 2, 2 2/3 | „3/2 ) 
S (c+ 30) O (r + = (var + rel") ), 
This is O(n!8/1°) if we set r = n3/5 [183]. The quantum walk algorithm ends with a superposition 
where most of the amplitude sits on sets R containing one edge of a triangle (i.e., two vertices of 
H that are part of a triangle). Now a measurement of that final state gives us such a set R with 
high probability, and then it’s relatively cheap to find the third vertex of the triangle by another 
Grover search over the n — r vertices of H that are not in R. 

The exponent 13/10 can be slightly improved further [41, 163, 145], and the current best 
exponent is 5/4 [162]. It is an open question what the optimal quantum query complexity for 
triangle-finding is; the best lower bound is only Q(n). Also, the optimal quantum time complexity 
of this problem is still wide open. 


Exercises 


1. Let d < n, and P be the projector on a d-dimensional subspace V C R” that is spanned by 
orthonormal vectors vj,...,vq- This means that Pv = v for all v € V, and Pw = 0 for all w 
that are orthogonal to V. 


(a) Show that P can be written in Dirac notation as P = ys CACHE 


(b) Show that R = 2P — I is a reflection through the subspace corresponding to P, i.e., 
Rv = v for all v in the subspace, and Rw = —w for all w that are orthogonal to the 
subspace. 
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2. Let G be a d-regular graph that is bipartite, so its vertex set V = [N] can be partitioned 
into disjoint sets A and B, and all its edges are in A x B. Give an eigenvector with eigen- 
value 1 of the associated N x N normalized adjacency matrix P, and another eigenvector 
with eigenvalue —1. 


3. This exercise is about obtaining a quantum algorithm for the collision problem with a slightly 
different quantum walk. Consider the problem of Section 8.3.2: we can query elements of 
the sequence of integers 79,...,%,—1, and want to find distinct 7 and j such that x; = 2; 
(or report that there are no collisions). Again consider the Johnson graph J(n,1r), for some 
r to be optimized over later. Deviating from Section 8.3.2, now call a vertex R marked if 
there exist i € R and j € [n] \ R such that x; = zj. Show that we can find a marked vertex 
in this graph with high probability using O(n?/?) queries to x. You may ignore small error 
probabilities, for example when using Grover’s algorithm. Be explicit about what data you 
store about x at each vertex R. 


4. (H) Let A, B, and C be n x n matrices with real entries. We’d like to decide whether or not 
AB = C. Of course, you could multiply A and B and compare the result with C, but matrix 
multiplication is expensive (the current best algorithm takes time roughly O(n?:°8)). 


(a) Give a classical randomized algorithm that verifies whether AB = C (with success prob- 
ability at least 2/3) using O(n”) steps, using the fact that matrix-vector multiplication 
can be done in O(n?) steps. 

(b) Show that if we have query-access to the entries of the matrices (i.e., oracles that map 
i, j,0 > i, j, Aij and similarly for B and C), then every classical algorithm needs at least 
Q(n?) queries to detect a difference between AB and C with error probability < 1/3. 

(c) Give a quantum walk algorithm that verifies whether AB = C (with success probability 
at least 2/3) using O(n5/3) queries to matrix-entries. 


5. A 3-SAT instance ġ over n Boolean variables 21,...,%p is a formula which is the AND of a 
number of clauses, each of which is an OR of 3 variables or their negations. For example, 
O(a1,---, £4) = (£1 V £2 V T3) A (£2 V £3 V T7) is a 3-SAT formula with 2 clauses. A satisfying 
assignment is a setting of the n variables such that ¢(z1,..., £n) = 1 (i.e, TRUE). You may 
assume the number of clauses is at most some polynomial in n. In general it is NP-hard to 
find a satisfying assignment to such a formula. Brute force would try out all 2” possible truth- 
assignments, but something much better is possible: consider the following simple algorithm 
of Sch6ning [219], which is a classical random walk on the set of all N = 2” truth assignments: 


Start with a uniformly random x € {0,1}”. 
Repeat the following at most 3n times: if (x) = 1 then STOP, else find the leftmost 
clause that is false, randomly choose one of its 3 variables and flip its value. 


One can show that this algorithm has probability at least (3/4)"/W5n of finding a satisfying 
assignment (if ¢ is satisfiable). You may assume this without proof. 


(a) Use the above to give a classical algorithm that finds a satisfying assignment with high 
probability in time (4/3)” - p(n), where p(n) is some polynomial factor. 

(b) (H) Give a quantum algorithm that finds a satisfying assignment (with high probability) 
in time \/(4/3)” - p(n). 
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Chapter 9 


Hamiltonian Simulation 


9.1 Hamiltonians 


Thus far, we have viewed the dynamics of quantum systems from the perspective of unitary trans- 
formations: apart from measurement, the only way a quantum state (i.e., a vector of amplitudes) 
can change is by multiplication with a unitary matrix, for instance a 2-qubit gate tensored with 
identities on the other qubits. But which unitary will actually occur in a given physical system? 
This is determined by the Hamiltonian of the system, which is the observable H corresponding to 
the total energy in the system. The expectation value (w|H|y) is called the energy of state |w). 
Typically, this total energy is the sum of several different terms, corresponding to kinetic energy, 
potential energy, etc. Also typically, it is the sum of many local terms that each act on only a few 
of the particles (qubits) of the system, for example if all interactions are between pairs of particles. 

One can think of the Hamiltonian H as describing the physical characteristics of the system. 
These do not determine the initial state |~)(0)) of the system, but they do determine the evolution of 
the state in time, i.e., the state |y(t)) as a function of the time-parameter t, given initial state |y(0)). 
This is governed by the most important equation in quantum mechanics: the Schrodinger equation. 
It is a linear differential equation that relates the time-derivative of the current state to that state 
itself and to the Hamiltonian: d(e) 

ih = H|y(t)). 

Here fi is a very small (at least in standard units) yet very important physical constant: Planck’s 
constant divided by 27. We can set it to 1 by choosing appropriate units, and hence will ignore it 
from now on. In general H may itself change with t, but for simplicity we will only consider here 
the case where H is time-independent. Then, if we start in some state |y(0)), the solution to this 
differential equation is the following unitary evolution of the state:! 


l(t) =U|(0)), where U = e*t, 


So t time-steps of evolution induced by Hamiltonian H, corresponds to applying the unitary matrix 
e™™H t times. Note, however, that t need not be integer here: this evolution is continuous in time, 
in contrast to the discrete picture one gets from the circuit model with elementary quantum gates. 


1 Applying a function, for instance f(x) = e~**, to a normal matrix means applying f to its eigenvalues: if A has 


diagonalization VDV~' then f(A) = Vf(D)V~*, where f(D) is the diagonal matrix obtained by applying f to the 
diagonal entries of D. For example, if A = >/, Ajaja; and f(x) =e”, then f(A) = D e "jaj;a;. Note that if A 


A 


is Hermitian, then e’” is unitary. 
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In areas like quantum chemistry (i.e., the study of properties of molecules and their interaction) 
and material sciences, it is often important to figure out how a quantum system will evolve from 
some given initial state, for instance a basis state.2 This is typically hard to do on classical 
computers, since the number of parameters (amplitudes) is exponential in the number of particles. 
However, a quantum computer is like a universal quantum system, and should be able to efficiently 
simulate every efficient quantum process, in the same way that a classical universal Turing machine 
can efficiently simulate other (classical) physical processes.® In fact, this was the main reason 
why Feynman invented quantum computers: as a controllable quantum system that can be used 
to simulate other quantum systems. In order to realize that idea, we need methods to efficiently 
implement the unitary evolution that is induced by a given Hamiltonian. In other words, we need 
methods to implement U = e~‘#* as a quantum circuit of gates (say, up to some small error € 
4), and to apply this to a given initial state |W). This is known as the problem of “Hamiltonian 
simulation.” 

In this chapter we will cover several methods for Hamiltonian simulation. For simplicity we’ll 
ignore the minus sign in Hamiltonian simulation, implementing U = e’”' rather than e~“”*. We 
will also assume that our quantum system consists of n qubits. Some physical systems, for instance 
electron spins, naturally correspond to qubits. More complicated Hilbert spaces, for instance with 
basis states labeled by the positions (x,y,z coordinates) of all particles involved, can be encoded 
(approximately) in binary to reduce them to the case of qubits. This encoding can be done in many 
ways; much of the art in quantum chemistry is in how best to do this for specific systems, but we 
won’t study that here (see for instance [77]). 


Word of warning: this chapter is denser and more complicated than most of the other chapters in 
these notes. On the other hand, unlike those chapters it explains some very recent, cutting-edge 
results. 


9.2 Method 1: Lie-Suzuki-Trotter methods 


Note that an n-qubit Hamiltonian is a 2” x 2” matrix, which is huge even for moderate n. Typically 
in Hamiltonian simulation we are dealing with very structured Hamiltonians that have a much 
shorter classical description. Suppose our Hamiltonian is of the form H = Xa H;, where m is 
not too big (say, polynomial in n) and each H; acts only on a few of the n qubits. For concreteness 


?Tt is also very important in chemistry to be able to find out global properties of a given Hamiltonian like its 
lowest energy, a.k.a. ground state energy. Unfortunately this problem seems to be hard to solve (in fact it is so-called 
QMaA-hard, see Chapter 14) even for a quantum computer, even for the special case of 2-local Hamiltonians [156, 149]. 

3In Chapter 13 we will see that it is actually possible to classically simulate quantum computers (and hence 
quantum systems more generally) with a polynomial amount of space, but our best methods still use an exponential 
amount of time. If factoring a large integer is a hard problem for classical computers (which is widely believed), 
then Shor’s efficient quantum factoring algorithm (Chapter 5) implies that it is impossible to simulate a quantum 
computer in polynomial time on a classical computer. 

“If n-qubit unitary U (e.g., a quantum circuit with not too many gates) is meant to approximate n-qubit unitary U, 
then we can measure the error by the operator norm of their difference |v — T| = maxy [viw — w|. However, 
we will also see simulation methods that allow some auxiliary qubits, say a of them, which start in state |0°) and 


should end in something close to state |0°). In this case U acts on more qubits than U, so we cannot use the operator 
norm of their difference; instead we can measure the error on the subspace of (n + a)-qubit states where the last a 


qubits are |0): max, | (U|w))|0°) — G(\w)|0%)) This way of measuring error still allows you to analyze a sequence 


of approximate unitaries using triangle inequality (as in Ex 4.4) in a way that the errors add up at most linearly. 
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assume each H; acts non-trivially on only two of the qubits. Such a Hamiltonian is called 2-local. 
Note that, for fixed t, the unitary e’/”%" is really just a 2-qubit gate, acting like identity on the other 
n — 2 qubits; this 2-qubit gate could in turn be constructed from CNOTs and single-qubit gates. 

Our goal is to implement U = Ht = e 2y 4i Tt is now tempting to view this exponential 
of a sum of matrices as a product IDa etit, which is just a product of m 2-qubit gates. If all 
terms H; are diagonal, or if there is some basis in which all terms are diagonal (equivalently, if 
all H; commute), then this indeed works out. However, in general matrix exponentials do not 
work that way: e4+? need not equal e4e? if A and B do not commute (see Exercise 1). The 
Lie-Suzuki-Trotter decomposition gives us a way to handle this. It uses the fact that if A and B 
have small operator norm, then e4+¥ and e4e? are approximately equal: e4+? = e4e? + E, where 
the error-term F is a matrix whose operator norm is O(|| A|| - |B|). 

How can we use this to approximate U by a circuit U of 2-qubit gates? Assume each of the terms 
Hj has operator norm < 1 (see Exercise 2 for why such normalization matters). First consider the 
simple case m = 2, so H = Hı + Hə. We can now implement U = et”t by doing a little bit of Hi, a 
little bit of Hə, a little bit of Hı, etc. More precisely, for every integer r > 1 of our choice, we have 


i= eit _ (Htr) = (reer Z (eee eer + EY. (9.1) 


Here the error-term E has norm ||E|| = O(||¢Hit/r|| - ||tHet/r||) = O(||Ai|| - | F2l|t?/r?). Our 
approximating circuit will be U = (e#1"/"e#42"/")" which uses 2r = O(t?/e) 2-qubit gates. Since 
errors in a product of unitaries add at most linearly (see Exercise 4.4), we have approximation error 


|v -|| <r = OCI - Hel? /r) = O@/r). 


Choosing r = O(t?/e), we can make this error < e. 
The same idea works for the general case where we have m > 2 Hamiltonian terms: 


U= et z (etr) = hare ine = Cages o. etHmt/r ai EY, (9.2) 


where || E|| = O(m?t?/r?) (Exercise 3). Choosing r = O(m?t?/e), we have an approximating circuit 
U = (eHit/r ... ctHmt/r)r with mr = O(mt?/e) 2-qubit gates, and error |v — T| < r||El| < e. 

This is the first-order Lie-Suzuki-Trotter approach to Hamiltonian simulation, due to Lloyd [172]. 
The number of gates of the circuit U depends quadratically on the time t for which we want to 
simulate the evolution, which is not optimal. One can do fancier higher-order Lie-Suzuki-Trotter 
decompositions that make the dependence on t nearly linear, but we won’t explain those here. 
Instead we will describe two methods below with linear t-dependence. The number of gates of U 
depends polynomially on ¢; this can be very much improved as well, as we will see.” 


°This means H can be described efficiently by m 4 x 4 matrices, rather than by a 2” x 2” matrix. A different 
assumption that is often made on Hamiltonians and that we will see later, is that H is s-sparse, meaning each of the 
2” columns has at most s nonzero entries, and we have some efficient “sparse access” to these nonzero entries. Note 
that if H = par H; and each H; acts on only 2 qubits, then H is 4m-sparse. Thus, roughly speaking, the locality 
assumption implies the sparsity assumption. 

6A non-rigorous but reasonably convincing way to see this is to approximate term e™ by its first-order Taylor 
series I + M, which is a good approximation if M has small norm (the error of the approximation will be quadratic 
in that norm). Then efe?” — e4+? = (I+ A)(I + B) -— (I + A+B) = AB. In case you ever need it: the so-called 
Baker-Campbell-Hausdorff formula gives a much more precise expression. 

"While the upper bounds on the number of gates for Trotter methods are theoretically worse (in their dependence 
on t and £) than the other two methods explained in this chapter, in practice Trotter is quite competitive [85]. Trotter 
also has the advantages of being relatively simple and of not requiring any auxiliary qubits. 
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9.3 Method 2: Linear combination of unitaries (LCU) 


Here we will describe a method for Hamiltonian simulation whose complexity depends linearly on 
the time t for which we want to evolve the state, and only logarithmically on the desired error e. 

Let’s start with a more general problem. Suppose we have a 2” x 2” matrix M and an n-qubit 
state |W), and we would like to prepare the state M|w) /||M|v)||. Here M need not be unitary, but 
suppose we can write M as a linear combination of unitaries: 


m 
M =) aV 
j=l 


with the a; being nonnegative reals (we can always absorb complex phases into the V;). Let 
lal], = >; a;, and let W be a unitary acting on [logm] qubits that maps 


1 , 
l j 
Suppose each V; is an “easy” unitary, for instance a 2-qubit gate tensored with identity on the other 
n—2 qubits, or a small circuit. Also suppose we can implement these unitaries in a controlled way: 
we have access to a 2-register unitary V = X>; |j)(j| 8 Vj. This maps |7)|¢) > |7)Vj|¢), and we 


can think of the first register as “selecting” which unitary V; to apply to the second register.° 
We want to use V and W to implement M on a given state |Y). Consider the following algorithm: 


1. Start with two-register state |0)|wW), where the first register has [log m] qubits. 
2. Apply W to the first register. 

3. Apply V to the whole state. 

4. Apply W~! to the first register. 


A small calculation (see Exercise 6) shows that the resulting state can be written as 


2 
ome + 4/1 — ET gy, (9.3) 
lalli lall? 


where |ġ}) is some other normalized state that we don’t care about, but that has no support on 
basis states where the first register is |0). Note that the state of (9.3) has norm 1, because the 
squared norm of the first term is || |b) ||? lali. 

If we were to measure the first register, the probability of outcome 0 is p = ||M|q)||?/Jall7. 
In case of that measurement outcome, the second register would collapse to the normalized ver- 
sion of M|w), as desired. The success probability p may be small, but we could use O(1/,/p) = 


8In fact every M can be written in such a way, because the 4” n-qubit Pauli matrices (each of which is unitary) 
form a basis for the linear space of all 2” x 2” matrices. See Appendix A.9. 

°In the literature, this V is often called “select-V.” One might expect the cost of V to be not much higher than 
the costliest V;, just like the cost of a classical “if A then B, else C” statement is not much bigger than the largest of 
the costs of B and C. However, if we measure circuit size, then the cost of V could be roughly the sum of the costs 
of the Vjs because circuits for each Vj should be “included” in the circuit for V. 
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O(\lal|, /|| Z|) ||) rounds of amplitude amplification to amplify the part of the state that starts 
with |0).‘° Thus we would prepare (the normalized version of) M2) in the second register. Unfor- 
tunately this usage of amplitude amplification assumes the ability to implement a unitary (as well 
as its inverse) to prepare |y) from a known initial state, say |0). Regular amplitude amplification 
won’t work if instead of a unitary that prepares |) we just have one copy of the state |w) available, 
which is the typical situation in Hamiltonian simulation. However, Exercise 8 gives us a variant 
called oblivious amplitude amplification, which circumvents this problem: it works even with just 
one copy of |Y}, as long as M is proportional to a unitary (or close to that). Fortunately, this is 
the situation when we use LCU for Hamiltonian simulation, where M ~ e*t., 


9.3.1 Hamiltonian simulation via LCU 


Recall that our goal is to efficiently implement the unitary e’”' that is induced by a given Hamil- 
tonian H, normalized so that ||H|| < 1. The following approach is due to Berry et al. [54, 55, 56]. 
Suppose, somewhat paradoxically, that we can write out the Hermitian matrix H as a linear com- 
bination of unitaries: H = DF aj;V;. For example, if H is the sum of m 2-local terms like before, 
then every 2-local term can be written as the sum of at most 16 n-qubit Pauli matrices (each of 
which is unitary and acts non-trivially on only two qubits). Thus we would decompose H as a 
sum of at most 16m unitaries, each acting non-trivially on only two of the n qubits. The sum of 
coefficients ||a||, will be O(m). 
Using the Taylor series e” = Xpo xf /k!, we write the unitary we want to implement as 


k 


; < (HHF SS (it)* A (it) 
Pek kl = k! ` ajVj =y kl > Aji Og, Vig? Vipo (9.4) 
| = ! 


0 jelm] k=0 ` jinnjkElm] 


Note that if each V; is easy to implement and k is not too big, then the unitary Vj ++- Vj, is 
also not too hard to implement. Exercise 9 shows that if we truncate the Taylor series at k = 
O(t+log(1/e)), dropping the terms of higher order, then the induced error (i.e., the dropped part) 
has operator norm at most ¢. Accordingly, we can take the part of the right-hand side of Eq. (9.4) 
for k = O(t+log(1/e)) and then use the linear combination of unitaries approach to approximately 
implement et¥t. The unitaries in this decomposition are of the form Vibes = iV; <- Vj; let 
Y= Decale |j15-++sIk) (G15+++5Jk|@Vju.,...,j, denote the controlled operation of the Vj, ,...;, unitaries, 
each of which involves k V;’s. The corresponding nonnegative coefficients in this decomposition are 


tk 
Birje = g Qjp, for k < O(t + log(1/e)). 


These -coefficients add up to 


O(t+log(1/e)) tr © tk = (llall) 
k=0 Fisik k=0 JisesIk k=0 


so straightforward application of the LCU method with oblivious amplitude amplification uses 
o(lislh) = O(etllell1) applications of V and V7}. 


Tf we do not know the value of p in advance, then we can try out exponentially decreasing guesses for p, like we 
do for Grover’s algorithm in Chapter 7 when we don’t know the number of solutions. 
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The logarithmic error-dependence of the complexity of the above method is excellent. The 
exponential dependence on t||a/|, is quite terrible for large t, but not too bad for very small t. So 
what we'll do if we want to do a simulation for large t, is to divide that t into b = t\la||, blocks 
of time T = 1/|la||, each, run the above algorithm for time r with error ec’ = ¢/b, and then glue 
b time-r simulations together. This will simulate (e’47)® = etHt, with error < be’ = e. The cost 
of each time-r simulation is O(e7!l1) = O(1) applications of V and V7}, each of which involves 
O(r + log(1/e’)) = O(log(E|la||,/e)) applications of the V;’s. The overall cost will be b times that, 
since we'll run b subsequent time-7 simulations in order to implement a time-t simulation. 

To give a more concrete example, consider again the special case where H = )°, H; consists of 
2-local terms, so the unitaries V; in the induced linear combination of unitaries H = Xia ajVj 
tHe 


only act nontrivially on 2 qubits each. Then we approximate the time-7 unitary e by a linear 


combination of unitaries 


O(r+log(1/e’)) 


M= > >. Pii See (9.5) 


k=0 ji,- -jk Elm] 


where each Vj.. ją is a product of k = O(7 + log(1/e’)) = O(log(t|a||,/e)) 2-qubit gates. We can 
implement this using the linear combination of unitaries approach, and repeat this b = t||a||, times. 
The cost of the unitary W is typically relatively small (see Exercise 7), so we can ¢-approximate 
the unitary ec’! using a circuit of roughly O(¢||al|, log(t||a||,/<e)) = O(mtlog(mt/e)) applications 
of V and Y~!, and slightly more other 2-qubit gates. Note the linear dependence of the cost on 
the evolution-time t, and the logarithmic dependence on the error £, both of which are better than 
Lie-Suzuki-Trotter methods. 


9.4 Method 3: Transforming block-encoded matrices 


In this section we’ll describe a recent approach that is very general and flexible. Suppose A is an 
n-qubit matrix with operator norm ||A|| < 1, and we can implement an (n + 1)-qubit unitary 


G2 2, | (9.6) 


The ‘-’s are unspecified 2” x 2”-dimensional matrices, the only constraint on which is that U is 
unitary. Such a U is called a unitary block-encoding of A. Note that 


U : |0)|w) = [0) Ald) + [1) 19), 


where we can’t say much about the (subnormalized) state |¢). Written more technically, the defining 
property of such a block-encoding is ((0| ® J)U(|0) 8 I) = A, where the first register is one qubit. 
More generally we can define an a-qubit block-encoding of A, which is an (a + n)-qubit unitary U 
with the property that ((0°| & I)U(|0°) & I) = A. 


Example 1: LCU does block-encoding. From Eq. (9.3) we can see that LCU (without the final 
amplitude amplifcation) implements a [log m |-qubit block-encoding of the matrix A = M/|lall,. 
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Example 2: Block-encoding a sparse Hermitian matrix. Let A be a 2” x 2” Hermitian 
matrix of operator norm ||A|| < 1 that is s-sparse, so each row and column of A have at most s 
nonzero entries (for simplicity assume exactly s nonzero entries). Since this matrix A is still an 
exponentially large object, we have to be careful how we can access such sparse matrices. First, we 
assume we can query the entries of A in the usual way: we have an oracle 


Oa : |i, j)10) = |i, 9) Aig) 


where we assume the last register has sufficiently many qubits to write down the complex entry Aj; 
either exactly or with sufficient precision. Of course, since A is sparse, Aj; will actually be 0 for 
most (i,j). Let v(j,£) € {0,...,N — 1} denote the location of the ¢-th nonzero entry of the j-th 
column of A; so the s nonzero entries in the j-th column are at positions v(j,0),...,v(j,s—1). We 
also assume we have another oracle that allows us to find these locations: 


OA loc : |j, £) > |j, v(j,£)). 


We also assume we can run Oe and Ozi oc: Together these assumptions are called having “sparse 
access” to A. 

We will now show how to implement a block-encoding of the matrix A/s. Exercise 10 shows 
how we can implement two (2n + 1)-qubit unitaries that create superpositions over the locations of 
the nonzero entries in the j-th column and i-th row of A, aaa 


w: Dae o S> lea), Wa J0))0")1*) D pli, 


k:Ap;A0 VE gaa 


using one O4 Joc-query and a few other A-independent gates. We can also implement the following 
unitary using one query to each of O4 and OF and a few other A-independent gates (and some 
auxiliary qubits that start and end in |0)): 


Wo : |0)|K, j) => AgjlO)|&, 9) + 4/1 — |Angl?11)|&, 9). 


By following the action on initial state |0"°+!7) step-by-step (Exercise 10), one can show that the 
(0"+4;,0"+17)-entry of U = Wz 'W2W, equals A;;/s. In other words, U is an (n + a)-qubit block- 
encoding of matrix A/s for some a (depending on how many auxiliary qubits are actually used). 

How can we use a given block-encoding U of A? Suppose that for some function f : R > R we 
want to implement a unitary V that looks like 


Ge o) 


using a small number of applications of the block-encoding of A. Here we don’t care what subma- 
trices sit at the ‘-’ entries of U or V, as long as the upper-left block of V is f(A) and V as a whole 
is unitary. For example, in Hamiltonian simulation A would be the Hamiltonian H and f(x) would 
be e’**, so that we are effectively implementing f(H) = et. In the HHL algorithm in the next 
chapter, f(x) will be 1/z, so that we effectively implement A7!. 

It turns out that we can implement a good approximation of V efficiently if we have a low-degree 
polynomial P approximating f. The idea is that we can let P act on the eigenvalues of A, thus 
transforming a block-encoding of A into one of P(A). We state without proof the following theorem 
by Gilyén et al. [122, follows by applying their Theorem 56 to the real and to the imaginary part 
of the polynomial], which extends work of Low et al. [177, 178, 176, 179, 178]. 
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Theorem 1 Let P : [-1,1] —> {c € C | |e] < 1/4} be a degree-d polynomial, and let U be a 
unitary a-qubit block-encoding of Hermitian matrix A. We can implement a unitary O(a)-qubit 
block-encoding V of P(A) using d applications of U and UT}, one controlled application of U, and 
O(ad) other 2-qubit gates. 


This theorem can be generalized to a powerful technique called “singular-value transforma- 
tion” [122], where A can be an arbitrary matrix, non-Hermitian and even non-square. 


9.4.1 Hamiltonian simulation via transforming block-encoded matrices 


Let’s see how we can use Theorem 1 for Hamiltonian simulation for a given sparse Hamiltonian H. 
We again approximate the function f(x) = e’*' using a polynomial P degree d = O(t + log(1/e)), 
which is the first d terms of the Taylor series of f (see Exercise 9), divided by 4 to ensure that 
its range satisfies the condition of Theorem 1. If H is s-sparse and we have sparse access to it, 
then Example 2 of Section 9.4 shows how to efficiently implement a block-encoding U of the scaled 
Hamiltonian H/s, using O(1) queries to H and O(n) other gates. Note that evolving Hamiltonian H 
for time t is the same as evolving H/s for time st. Theorem 1 now gives us a block-encoding V of 
P(H) ~ $e#'. This V invokes U and U~! O(st + log(1/e)) times, and maps: 


V : |0)|e) = OPY) +19), 
where |¢) has no support on basis states starting with |0). Since P(H) ~ }e'”” is essentially 
proportional to a unitary, we can now apply O(1) rounds of oblivious amplitude amplification to 
boost the factor i to essentially 1, using only one copy of |}. 
This implements the desired unitary e’“' on one copy of |Y), up to small error. The complexity 
of e-precise Hamiltonian simulation of an s-sparse Hamiltonian H of operator norm < 1 then 
becomes O(st + log(1/e)) queries to H and O(n(st + log(1/e))) 2-qubit gates. 


Exercises 


1. Compute the following five 2 x 2 unitaries: e’"7, e’™*, 


X and Z are the usual Pauli matrices. 


imX inZ inZ pin X in(X+Z) 
er. e 


e e , and e . Here 


2. Suppose we want to implement a certain unitary U, and we can do that by switching on a 
Hamiltonian H for some time t: U = e~*“'. Now suppose H’ is another Hamiltonian, with 
100 times as much energy as H: H’ = 100H. Show that using H’ we can implement U a 100 
times faster than with H. 

Comment: This exercise is about time in the physical sense of the word, not about “time complexity” in the 
sense of circuit size. It shows why some kind of normalization of H is needed if we want to talk about the time 
it takes to implement something. We can always “speed up” a computation by a factor k if we can multiply 


our Hamiltonian with a factor k. 


3. (H) This exercise justifies the error bound of Eq. (9.2). Let 6 > 0 be small (in Eq. (9.2) we’d 
set ô = t/r). Show that there is a constant c, independent of ô and Hj,..., Hm, such that 


eifiot-+iHmd 2 etd co. etHmô F E, 


for some E of norm ||E|| < c8? aa. 


78 


4. Consider the simple case of the linear-combination-of-unitaries trick where m = 2 and M = 
Vi+V2. Describe the unitaries V and W, and track the initial state |0)|W) through the 4-step 
algorithm in Section 9.3. 

5. (a) Write the gate G = = ( s7 

AAi 

Appendix A.9). 


(b) Suppose you want to implement G via LCU, using the linear combination of (a). What 
are W|0) and V? 


(c) Consider the final state after the 4-step algorithm of Section 9.3. Calculate the part of 
that state that starts with |0) (without writing out that final state fully!) 


) as a linear combination of the Pauli matrices (see 


6. (H) Give a calculation to justify that the 4-step algorithm in Section 9.3 indeed always 
produces a state of the form of Eq. (9.3). 


7. Let v € [-1,1}% be a vector with real entries, of dimension N = 2”, indexed by i € {0,1}”. 
Suppose we can query the entries of this vector by a unitary that maps 


Oy: ji) 0P) > |¢) |v), 


so where the binary representation of the i-th entry of v is written into the second register. 
We assume this second register has p qubits, and the numbers v; can all be written exactly 
with p bits of precision (it doesn’t matter how, but for concreteness say that the first bit 
indicates the sign of the number, followed by the p— 1 most significant bits after the decimal 


1 
dot). Our goal is to prepare the n-qubit quantum state |) = Pel 5 vili). 
ic{0,1}” 


(a) Show how you can implement the following 3-register map (where the third register is 
one qubit) using one application of O, and one of O71, and some v-independent unitaries 
(you don’t need to draw detailed circuits for these unitaries, nor worry about how to 
write those in terms of elementary gates). 


li)10P)10} = |i)|0?) (vi10) + 4/1 — v11). 


(b) Suppose you apply the map of (a) to a uniform superposition over all i € {0,1}". Write 
the resulting state, and calculate the probability that measuring the last qubit in the 
computational basis gives outcome 0. 


(c) What is the resulting 3-register state if the previous measurement gave outcome 0? 


(d) Assume you know ||v|| exactly. Give an algorithm that prepares |~) exactly, using 
VN 
O E applications of O, and O;', and some v-independent unitaries. 
v 
8. (H) This exercise explains oblivious amplitude amplification. 
Let M be an n-qubit unitary. We start from |Y) = |0%)|~) for unknown n-qubit state |y). 


Our goal is to prepare the state |®) = |0%)M|w) (this |®) is the analogue of the “good state” 
in amplitude amplification). Let U be an (a+ n)-qubit unitary, independent of |Y}, such that 


U|W) = sin(@)|®) + cos(@)|*), 
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10. 


11. 


where 0 is some angle that’s independent of |y), while |®+) is some normalized state that 
depends on |) and has no support on basis states starting with 0% (this |®+) is the analogue 
of the “bad state”). If @ is close to 7/2, then we can just apply U to our starting state |W) 
and measure the first register; we’ll see 0° with probability sin(@)? ~ 1 and in that case end 
up with the desired state |®). But suppose @ is quite small. Here we will see how we can 
amplify the angle 6 to roughly 7/2, without assuming a unitary to prepare |W). 


(a) Let S be the 2-dimensional space spanned by |®) and |®+). Let R = (I — 2|0%)(0%|) @I 
be a unitary that puts a ‘—’ in front of every basis state that starts with 0°. Show that 
R, restricted to S, is a reflection through |®+). 


Define |U¥+) = U~! (cos()|®) — sin(9)|®+)). Show U|W) and U|W+) are orthogonal. 
One can also show with a bit more work [54, Lemma 3.7] the stronger statement that 


|W) has no support on basis states starting with 0°. You may assume this fact without 
proof in the remainder of this exercise. 


Show that —-URU™|, restricted to S, is a reflection through U|W) (note the minus sign!) 
Show that (-URU-!R)*U|0*)|w) = sin((2k + 1)0)|®) + cos((2k + 1)6)|6+). 
How large should we take k in order to end up with (approximately) the state |®)? 


€ 


fa OSS 
oO A a 
Wwe ya Sa 


NB: If you know @ exactly, then you can even exactly prepare |®) (along the lines of Exercise 7.5) but 


you don’t need to show that. 


. (H) Let £ € (0,0.99). Show that you can choose a sufficiently large constant c (independent 


of t and €) such that for all Hermitian H with operator norm ||H]|| < 1, we have 


c(t+log(1/e))—1 


iHi GHO" _ 7 (iHt)* 
ee K 2 gi- 
k=0 k=c(t+log(1/e)) 


This exercise looks at the details of block-encoding an s-sparse matrix A with ||A|| < 1 from 
Section 9.4. Assume for simplicity that the entries of A are real. 


(a) Show how to implement Wj using an O44 joc-query and a few other A-independent gates. 
For simplicity you may assume s is a power of 2 here, and you can use arbitrary single- 
qubit gates, possibly controlled by another qubit. 

(Note that the same method allows to implement W3.) 


(b) (H) Show how to implement W 2 using an O,-query, an O;'-query, and a few other 
A-independent gates (you may use auxiliary qubits as long as those start and end in 


|0)). 
(c) Show that the (0"+1i,0"+1j)-entry of Wg "Wy is 1/s if Ai; #0, and is 0 if Aij = 0. 
(d) Show that the (0"+1i, 0"+1j)-entry of Wz 'W2W, is exactly Aj;/s. 


(a) Give a quantum circuit on n + 1 qubits that uses O(n) gates, no auxiliary qubits, and 
computes the parity of the first n bits, in the sense that it maps 


|x)|0) + |x)| X z: mod 2) for all z € {0,1}”. 
i=l 
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(b) Let P= Z®Z®---@Z be an n-qubit Hamiltonian, where Z is the usual phase-flip 
Pauli matrix. What is the result of applying matrix P to an n-qubit basis state |x)? 


(c) Fix some positive real number t. Let U = et be the n-qubit unitary induced by 
applying the above Hamiltonian for time t (via the Schrödinger equation, dropping the 
minus sign in the exponent like in Section 9.1). What is the result of applying U to an 
n-qubit basis state |x)? 


(d) (H) Give a quantum circuit with O(n) gates that implements U exactly using one aux- 
iliary qubit that starts and ends in |0). You may use arbitrary single-qubit gates, which 
may be controlled by another qubit. 


(e) (H) Now suppose P is a product of n arbitrary Pauli matrices, not ncessarily all-Z. 
Show how to implement U = et”*. 


12. Suppose you have a classical description of an n-qubit Hamiltonian H that is the sum of 
m = n? 2-local terms. Assume the eigenvalues of the Hermitian matrix H lie in [0,1), and 
can all be written exactly with 2 log n bits of precision. You would like to exactly determine the 
smallest eigenvalue Amin of H, corresponding to unknown n-qubit eigenstate |Ymin). You’re 
given (as a quantum state) an n-qubit state |W) that has a significant overlap with |¢min): 
\(q|¢min)|? > 0.7. Give a polynomial-size quantum circuit that, with probability > 2/3, 
outputs Amin exactly. 

NB: You don’t need to write down the circuit to the last detail; a clear description of the different parts of the 


circuit (possibly with some reference to details in the lecture notes) suffices. 
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Chapter 10 


The HHL Algorithm 


10.1 The linear-system problem 


In this chapter we present the Harrow-Hassidim-Lloyd (HHL [134]) algorithm for solving large 
systems of linear equations. Such a system is given by an N x N matrix A with real or complex 
entries, and an N-dimensional nonzero vector b. Assume for simplicity that N = 2”. The linear- 
system problem is 


LSP: find an N-dimensional vector x such that Ax = b. 


Solving large systems of linear equations is extremely important in many computational problems 
in industry, in science, in optimization, in machine learning, etc. In many applications it suffices 
to find a vector č that is close to the actual solution zx. 

We will assume A is invertible (equivalently, has rank N) in order to guarantee the existence 
of a unique solution vector x, which is then just A~'b. This assumption is just for simplicity: if 
A does not have full rank, then the methods below would still allow to invert it on its support, 
replacing A~! by the “Moore-Penrose pseudoinverse.” 

The HHL algorithm can solve “well-behaved” large linear systems very fast (under certain 
assumptions), but in a rather weak sense: instead of outputting the N-dimensional solution vector 
x itself, its goal is to output the n-qubit state 


1 N-1 
le) = — Ý ali), 
lel & 


or some other n-qubit state close to |x}. This state |x) has the solution vector as its vector of 
amplitudes, up to normalization. This is called the quantum linear-system problem: 


QLSP: find an n-qubit state |Z) such that |||”) — |Z)|| < € and Az = b. 


Note that the QLSP is an inherently quantum problem, since the goal is to produce an n-qubit 
state whose amplitude-vector (up to normalization and up to e-error) is a solution to the linear 
system. In general this is not as useful as just having the N-dimensional vector x written out on 
a piece of paper, but in some cases where we only want some partial information about x, it may 
suffice to just (approximately) construct |x}. 

We will assume without loss of generality that A is Hermitian (see Exercise 1). Let us state the 
more restrictive assumptions that will make the linear system “well-behaved” and suitable for the 
HHL algorithm: 
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1. We have a unitary that can prepare the vector b as an n-qubit quantum state |b) = Tal >>, bili) 


using a circuit of B 2-qubit gates. We also assume for simplicity that ||b|| = 1. 


2. The matrix A is s-sparse and we have sparse access to it, like in Section 9.4. Such sparsity 
is not essential to the algorithm, and could be replaced by other properties that enable an 
efficient block-encoding of A. 


3. The matrix A is well-conditioned: the ratio between its largest and smallest singular value 
is at most some «.! For simplicity, assume the smallest singular value is > 1/« while the 
largest is < 1. In other words, all eigenvalues of A lie in the interval [—1,—1/k] U [1/k, 1]. 
The smaller the “condition number” « is, the better it will be for the algorithm. Let’s assume 
our algorithm knows «, or at least knows a reasonable upper bound on k. 


10.2 The basic HHL algorithm for linear systems 


Let us start with some intuition. The solution vector x that we are looking for is A~'b, so we 
would like to apply A~! to b. Because A is assumed to be Hermitian, it has spectral decomposition 
A= ery Ajaja}, where the vectors a; are an orthonormal basis of eigenvectors for the whole 
N-dimensional space, and Aj € R are the corresponding eigenvalues. Then the map A7! is the 
same as the map a; +> RE we just want to multiply the eigenvector a; with the scalar 1/A;. 
The vector b can also be written as a linear combination of the eigenvectors aj: b = $. j Bja; (we 
don’t need to know the coefficients 6; for what follows). We want to apply Aq! to b to obtain 
A= jbj Xij normalized, as an n-qubit quantum state. 

Unfortunately the maps A and A™! are not unitary (unless |A;| = 1 for all j), so we cannot just 
apply AT! as a quantum operation to state |b) to get state |x). Fortunately U = e’4 = > j ew aja, 
is unitary, and has the same eigenvectors as A and A~!. We can implement U and powers of U by 
Hamiltonian simulation, and then use phase estimation (Section 4.6) to estimate the À; associated 
with eigenvector |a;) with some small approximation error (for this sketch, assume for simplicity 
that the error is 0). Conditioned on our estimate of A; we can then rotate an auxiliary |0)-qubit 


to = |) + ,/1 ray!) (this is a valid state because |kA;| > 1). Next we undo the phase 


estimation to set the register that contained the estimate back to |0). Suppressing the auxiliary 
qubits containing the temporary results of the phase estimation (these qubits start and end in state 
|0}), we have now unitarily mapped 


1 | 1 
|aj)|0) => Jay) (50 +yfl— ==) 


If we prepare a copy of |b)|0) = we {;\a;)|0) and apply the above unitary map to it, then we obtain 


> Bila) (i! ft =) = 5 855 les) 10) +190, 
J J 


|x) 


'This ratio is called the condition number. Note that the assumption that A is invertible is equivalent to having 
a finite condition number. The stronger assumption that the condition number is small, intuitively says that A is 
invertible in a stable or robust way, so that small errors (say due to noise or to finite-precision rounding) don’t lead 
to massive errors in the solution vector x. 
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where we don’t care about the (subnormalized) state |¢). Note that because }), fra = 
ar |3;|? = 1, the norm of the part of the state ending in qubit |0) is at least 1/K. Accord- 
ingly, we can now apply O(«) rounds of amplitude amplification to amplify this part of the state 
to have amplitude essentially 1. This prepares state |x) to good approximation, as intended. 

This rough sketch (which Exercise 2 asks you to make more precise) is the basic idea of HHL. 
It leads to an algorithm that produces a state |%) that is e-close to |x), using roughly K?s/e queries 
to A and roughly Ks(Kn/e + B) other 2-qubit gates. 


10.3 Improving the efficiency of the HHL algorithm 


The complexity of the above basic HHL algorithm can be improved further. Gilyén et al. [122] 
used the singular-value transformation technique of Section 9.4 to implement A~!, improving on 
an LCU construction due to Childs et al. [83]. We would like to apply the function f(x) = 1/x to 
a block-encoding of A in order to get a block-encoding of A~! (up to normalization) that we can 
then apply to |b)|0). 

The function f(x) = 1/z is not itself a polynomial, so we need to approximate it by a low-degree 
polynomial to be able to apply Theorem 1 of Chapter 9. Childs et al. [83, Lemmas 17-19] started 
from the following polynomial of degree D = 2b — 1 for b = O(k? log(K/e)): 


1—(1—<27)® 
= 


This is indeed a polynomial because all terms in the numerator have degree > 1, so we can divide 
out the x of the denominator. Since (1 — x?)° is close to 0 (unless |x| is small), this polynomial 
is indeed close to 1/x (unless |z| is small, but we won’t care about that because we’ll apply this 
polynomial to a matrix whose eigenvalues aren’t close to 0). More precisely, this polynomial is 
e/2-close to 1/x whenever z lies in the interval E,, = [—1,—-1/k]U[1/«, 1]. Its range on this domain 
is [—K, —1] U [1, x] (ignoring the small € for simplicity). Like every degree-D polynomial, f can be 
written exactly as a sum of the first D + 1 Chebyshev polynomials of the first kind.? Childs et 
al. show that the coefficients in this sum decrease quickly for larger degree, and that dropping the 
Chebyshev polynomials of degree higher than d = O(k log(«/e)) incurs only small error ¢/2. The 
resulting degree-d polynomial p ¢-approximates 1/x on the interval Ex, and its largest value (in 
absolute value) on this domain is k. Now define the polynomial P = p/(4«). This has the same 
degree d as p, but a range [—1/4,1/4] that fits the assumption of Theorem 1 of Chapter 9 (there’s 
a trick to ensure the values of P are within that range even for x ~ 0, which we’ll skip here). 

As we saw in Section 9.4, we can implement a block-encoding of the s-sparse matrix A/s using 
O(1) sparse-access queries to A and O(n) other gates. Using a factor O(s) more work, we can 
turn this into a block-encoding of A itself (alternatively, we could directly invert the matrix A/s, 
whose singular values are > 1/(Ks)). We now apply Theorem 1 with this block-encoding of A, 
and the polynomial P = p/(4k), of degree d = O(k log(K/e)). Note that all eigenvalues of A lie 
in the interval E,,, where p(x) ~ 1/a, hence p(A) ~ A~! and P(A) ~ 7A™!. Theorem 1 then 
gives us a block-encoding of P(A), at the expense of running the block-encoding of A O(d) times. 
Using O(«) rounds of amplitude amplification on top of this, we can get rid of the 1/(4«) factor 


These univariate polynomials are defined recursively as follows: To(x) = 1, Ti(x) = z, and Tayı = 2xTy(x) — 
Ta—1(x). Note that Ty has degree d, and maps [—1, 1] to [—1, 1]. The polynomials To,..., Tp are linearly independent 
(even orthonormal in a certain way) and hence span the set of all univariate polynomials of degree < D. 
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and end up with essentially the state A~'|b), normalized.? This gives a quantum algorithm that 
solves the QLSP using O(d«s) = O(k?s log(K/e)) queries to A, and O(Ks(Kn log(«/e) + B)) 2-qubit 
gates. Note that compared to basic HHL, the dependence on 1/¢ has been improved from linear to 
logarithmic. The dependence on « can also be improved from quadratic to linear, using a technique 
called “variable-time amplitude amplification” [15, 83, 79, 169] that we won’t explain here. 

The HHL algorithm can in some cases solve the QLSP exponentially faster than classical algo- 
rithms can solve the LSP. In particular, if the sparsity s, the condition number «, and the cost B of 
preparing |b) are all < polylog(V), and the allowed error is € > 9-polylogww }. then this improved 
version of the HHL algorithm uses polylog(V) queries and gates to solve (in a quantum way) an 
N-dimensional linear system. It can also be used for other tasks, for instance approximately solving 
differential equations and other applications in scientific computing, see the lecture notes of Lin 
Lin [168] and references therein. 


Exercises 


1. Suppose we are given an arbitrary invertible N x N matrix A and an N-dimensional vector b. 


(a) Give a Hermitian 2N x2N matrix A’ (depending on A but not on b) and 2N-dimensional 
vector b' (depending on b but not on A), such that a solution x to the linear system 
Ax = b can be read off from a solution to the system A’z’ = b. 


(b) How does the condition number of your A’ relate to that of A? 


2. This exercise asks you to add more details to the sketch of the basic HHL algorithm given at 
the start of Section 10.2. For simplicity we will only count queries, not gates. 


(a) Use Hamiltonian simulation and phase estimation to implement the following unitary 
map: 


|aj)|0) + Jay) |Ay), 


where |A;) is a superposition over estimates of Aj, which (if measured) gives with prob- 
ability > 0.99 an estimator £ € [—1, 1] such that |A; — 4| < €/k. Your implementation is 
allowed to use O(Ks/e + log(K/e)) queries to the sparse matrix A. You may invoke the 
best Hamiltonian simulator for sparse matrices from Section 9.4. 


(b) Show that the basic HHL algorithm can be implemented using O(«?s/e + & log(K/e)) 
sparse-access queries to A. To make your life easier, you may assume that |A;) is just 
one basis state, so one estimator which is close to \; rather than a superposition over 
estimators (and hence the success probability 0.99 is actually 1). You may also assume 
for simplicity that the amplitude amplification at the end works perfectly. 


3. Suppose A and B are sparse, well-conditioned N x N matrices, and we can efficiently generate 
vector b € RY as a quantum state |b). Here we will see how we can efficiently find the solution 
to the linear system ABx = b as a quantum state |x). 


More precisely, assume N = 2”. Assume we have a unitary circuit to produce the n-qubit 
state |b) with a number of elementary gates that’s polynomial in n. Let są and sg be the 


3Note that we need to assume a unitary to prepare |b) here, having just one copy of |b) is not enough. We cannot 
use oblivious amplitude amplification because that assumes we have a block-encoding of a matrix that is proportional 
to a unitary (or close to that), which A~' is not. 
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sparsities of the matrices A and B, respectively, and xA and «pg be their condition numbers 
(ratio of largest over smallest singular value). Let £ € C be B~!A7!b, which is the unique 
solution to the linear system ABx = b. Show how you can produce an n-qubit state |Z) that 
is e-close (in the usual Euclidean distance) to the n-qubit state |x) = al Nieto} Tili), using 
a number of queries (to matrix entries) and elementary gates that is polynomial in s4, SB, 
KA, KB, l/e, and n. 

Comment: Again, the point here is to avoid the polynomial dependence on the dimension N that classical 


linear solvers would have, and replace that by a polynomial dependence on n = log N. 
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Chapter 11 


Quantum Query Lower Bounds 


11.1 Introduction 


Most of the algorithms we have seen so far worked in the query model. Here the goal usually is 
to compute some function f : {0,1}% — {0,1} on a given input £ = 2...r%y_1 € {0,1}%. The 
distinguishing feature of the query model is the way x is accessed: x is not given explicitly, but 
is stored in a random access memory, and we’re being charged unit cost for each query that we 
make to this memory. Informally, a query asks for and receives the i-th element x; of the input. 
Formally, we model a query unitarily as the following 2-register quantum operation O,, where the 
first register is N-dimensional and the second is 2-dimensional!: 


Ox : |i, b) > |i,b @ xi). 


In particular, |i, 0) +> |i, x:)}. This only states what Oy does on basis states, but by linearity this 
determines the full unitary. Note that a quantum algorithm can apply O, to a superposition of 
basis states, gaining some sort of access to several input bits x; at the same time. 

A T-query quantum algorithm starts in a fixed state, say the all-0 state |0...0), and then in- 
terleaves fixed unitary transformations Uo, U1,...,Ur with queries. The algorithm’s fixed unitaries 
may act on a workspace-register, in addition to the two registers on which O, acts. In this case we 
implicitly extend O, by tensoring it with the identity operation on this extra register, so it maps 


Oz : |i, b, w) > |i, b @ zi, w). 
Hence the final state of the algorithm can be written as the following matrix-vector product: 
UrOzUr-10z-OzU10zUo0]0...0). 


This state depends on the input x only via the T queries. The output of the algorithm is obtained 
by a measurement of the final state. For instance, if the output is Boolean, the algorithm could 
just measure the final state in the computational basis and output the first bit of the result. 

The query complexity of some function f is now the minimal number of queries needed for an 
algorithm that outputs the correct value f(x) for every x in the domain of f (with error probability 


If the input x consists of non-binary items x; (as is the case for instance with the input for Simon’s algorithm) 
then those can be simulated by querying individual bits of each x;. 
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at most 1/3, say). Note that we just count queries to measure the complexity of the algorithm?, 
while the intermediate fixed unitaries are treated as costless. 

In many cases, the overall computation time of quantum query algorithms (as measured by 
the total number of elementary gates, say) is not much bigger than the query complexity. This 
justifies analyzing the latter as a proxy for the former. This is the model in which essentially all 
the quantum algorithm we’ve seen work: Deutsch-Jozsa, Simon, Grover, the various random walk 
algorithms. Even the period-finding algorithm that is the quantum core of Shor’s algorithm works 
because it needs only few queries to the periodic function. 


11.2 The polynomial method 


From quantum query algorithms to polynomials. An N-variate multilinear polynomial p 
is a function p : C > C that can be written as 


p(Zo0,--.,£N-1) = as | [ za, 


SC{0,..,.N-1} ies 


for some complex numbers ag. The degree of p is deg(p) = max{| S| : as # 0}. It is easy to show 
that every function f : {0,1}% — C has a unique representation as such a polynomial; deg(f) is 
defined as the degree of that polynomial (see Exercise 1). For example, the 2-bit AND function is 
p(xo, £1) = Xx, and the 2-bit Parity function is p(xo, £1) = £o + zı — 249271. Both polynomials 
have degree 2. Sometimes a lower degree suffices for a polynomial to approximate the function. 
For example, p(x, 21) = (zo + xı) approximates the 2-bit AND function up to error 1/3 for all 
inputs, using degree 1. 

A very useful property of T-query algorithms is that the amplitudes of their final state are 
degree-T N-variate polynomials of x [113, 37]. More precisely: consider a T-query algorithm with 
input x € {0,1}% acting on an m-qubit space. Then its final state can be written 


S a.(2)|2), 


z€{0,1}™ 


where each qa, is a multilinear complex-valued polynomial in x of degree at most T. 


Proof. The proof is by induction on T. The base case (T = 0) trivially holds: the algorithm’s 
state Up|0...0) is independent of x, so its amplitudes are constants. 

For the induction step, suppose we have already done T queries. Then by the induction hy- 
pothesis the state after Ur can be written as 


S a.(e)|2), 


z€{0,1}™ 


where each a, is a multilinear polynomial in x of degree at most T. Each basis state |z) = |i, b, w) 
consists of 3 registers: the two registers |i, b) of the query, and a workspace register containing basis 
state |w). The algorithm now makes another query O, followed by a unitary Ur; 1. The query 


2Clearly, N queries always suffice since we can just query each of the N input bits separately, thus learning x 
completely, and then look up and output whatever the correct value is for that input. 
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swaps basis states |i,0,w) and |i,1,w) if zx; = 1, and doesn’t do anything to these basis states if 
x; = 0. This changes amplitudes as follows: 


2%,0,w(X)|t, 0, w) + Ai,1,w(2)|é, 1, w) => 
((1 = 2%) Q%,0,0(X) + iai, 1,w(2)) ji, 0, w) + (210%,0,w(@) + (1 — 2) a%,1,0(x))|é, 1, w). 


Now the new amplitudes are of the form (1—2;)04,0,w(X)+2;04,1,w() OF L4Q4,0,w(Z)+(1— 24) 04,1, (2). 
The new amplitudes are still polynomials in xp,...,2y_1. Their degree is at most 1 more than 
the degree of the old amplitudes, so at most T +1. Finally, since Up,; is a linear map that is 
independent of x, it does not increase the degree of the amplitudes further (the amplitudes after 
Ur+1 are linear combinations of the amplitudes before Ur;,). This concludes the induction step. 

Note that this construction could introduce degrees higher than 1, e.g., terms of the form z2. 
However, our inputs x; are 0/1-valued, so we have ak = x; for all integers k > 1. Accordingly, we 


can reduce higher degrees to 1, making the polynomials multilinear without increasing degree. 


Suppose our algorithm acts on an m-qubit state. If we measure the first qubit of the final state 
and output the resulting bit, then the probability of output 1 is given by 


p(x) = >. loz (x)/’. 
ze{1}x{0,1}™-1 
This is a real-valued polynomial of x of degree at most 2T, because |a,(a)|? is the sum of the squares 
of the real and imaginary parts of the amplitude a,(x), and each of those two parts is a polynomial 
of degree < T. Note that if the algorithm computes f with error < 1/3, then p is an approximating 
polynomial for f: if f(x) = 0 then p(x) € [0,1/3] and if f(x) = 1 then p(x) € [2/3,1]. This gives 
a method to lower bound the minimal number of queries needed to compute f: if one can show 
that every polynomial that approximates f has degree at least d, then every quantum algorithm 
computing f with error < 1/3 must use at least d/2 queries. 


Applications of the polynomial method. For our examples we will restrict attention to sym- 
metric functions.* Those are the ones where the function value f(a) only depends on the Hamming 
weight (number of 1s) in the input x. Examples are N-bit OR, AND, Parity, Majority, etc. 

Suppose we have a polynomial p(xo,...,%N—1) that approximates f with error < 1/3. Then 
it is easy to see that a polynomial that averages over all permutations m of the N input bits 
L0,+-++,UN-1: i 

de) = $ Ð v(a(a)), 
TESN 

still approximates f. As it turns out, we can define a single-variate polynomial r(z) of the same 
degree as q, such that q(x) = r(|a|).4 This r is defined on all real numbers, and we know something 


3One can also use the polynomial method for non-symmetric functions, for instance to prove a tight lower bound of 
Q(N 2/ 3) queries for the general problem of collision-finding; this matches the quantum walk algorithm of Section 8.3.2. 
However, that lower bound proof is substantially more complicated and we won’t give it here (see [4]). 

“To see why this is the case, note that for every degree i, all degree-i monomials in the symmetrized polynomial 
q have the same coefficient a;i. Moreover, on input x € {0, 1y% of Hamming weight z, exactly (7) of the degree-i 
monomials are 1, while the others are 0. Hence q(x) = bia a; ('@!). Since (4) = 2(z —1)---(2 — d + 1)/d! is a 
single-variate polynomial in z of degree d, we can define r(z) = pee ai (7): For example, if q(£o, x1) = zozı then r 
would be the unique univariate polynomials such that r(0) = 0, r(1) = 0 and r(2) = 1, i.e., r(z) = z(z — 1)/2 = (8). 
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about its behavior on integer points {0,...,N}. Thus it suffices to lower bound the degree of 
single-variate polynomials with the appropriate behavior. 

For an important example, consider the N-bit OR function. Grover’s algorithm can find an i 
such that z; = 1 (if such an 7 exists) and hence can compute the OR function with error proba- 
bility < 1/3 using O(V N) queries. By the above reasoning, any T-query quantum algorithm that 
computes the OR with error < 1/3 induces a single-variate polynomial r satisfying 


r(0) € [0,1/3], and r(t) € [2/3, 1] for all integers t € {1,..., N}. 


This polynomial r(x) “jumps” between x = 0 and x = 1 (i.e., it has a derivative r'(x) > 1/3 
for some x € [0,1]), while it remains fairly constant on the domain {1,..., N}. By a classical 
theorem from approximation theory (proved independently around the same time by Ehlich and 
Zeller [105], and by Rivlin and Cheney [212]), such polynomials must have degree d > Q(VN). 
Hence T > Q(VN) as well. Accordingly, Grover’s algorithm is optimal (up to a constant factor) in 
terms of number of queries. 

What about exact algorithms for OR? Could we tweak Grover’s algorithm so that it always 
finds a solution with probability 1 (if one exists), using O(N) queries? This turns out not to be 
the case: a T-query exact algorithm for OR induces a polynomial r of degree < 2T that satisfies 


r(0) = 0, and r(t) = 1 for all integers t € {1,..., N}. 


It is not hard to see that such a polynomial needs degree at least N: observe that r(x) — 1 is a 
non-constant polynomial with at least N roots. Hence T > N/2 (this can be improved to T > N, 
see Exercise 5). Accordingly, Grover cannot be made exact without losing the square-root speed-up! 

Using the polynomial method, one can in fact show for every symmetric function f that is 
defined on all 2% inputs, that quantum algorithms cannot provide a more-than-quadratic speed-up 
over classical algorithms. More generally, for every function f (symmetric or non-symmetric) that 
is defined on all inputs®, quantum algorithms cannot provide a more-than-6th-root speed-up over 
classical algorithms (see Exercise 11). The polynomial method has recently been strengthened by 
Arunachalam et al. [26] to an optimal lower bound method, by imposing more constraints on the 
polynomial (which can increase the required degree, while still giving a lower bound on quantum 
query complexity). 


11.3 The quantum adversary method 


The polynomial method has a strength which is also a weakness: it applies even to a stronger (and 
less physically meaningful) model of computation where we allow any linear transformation on the 
state space, not just unitary ones. As a result, it does not always provide the strongest possible 
lower bound for quantum query algorithms. 

Ambainis [12, 13] provided an alternative method for quantum lower bounds, the quantum 
adversary. This exploits unitarity in a crucial way and in certain cases yields a provably better 
bound than the polynomial method [13]. We will present a very simple version of the adversary 
method here, a much stronger (in fact optimal!) version is given in Chapter 12. 


5A “root” is an z such that r(x) = 0. It is a well-known fact from algebra that every univariate non-constant 
polynomial of degree d has at most d roots (over any field). Note that this is not true for multivariate polynomials; 
for example the polynomial xo ---2n_1 has 2% — 1 roots in {0,1}% but its degree is only N. 

Note that this doesn’t include functions where the input has to satisfy a certain promise, such as Deutsch-Jozsa 
and Simon’s problem. 
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Recall that a quantum query algorithm is a sequence 
UrpO,Ur_10z ore OzU10zUo0, 


applied to the fixed starting state |0...0), where the basic “query transformation” Oy depends 
on the input x, and Up, U1,..., Ur are arbitrary unitaries that don’t depend on x. Consider the 
evolution of our quantum state under all possible choices of x. Let |24) denote the state after 
applying U; when the input is x. In particular, |y?) = Up|0...0) for all x (and hence CAKO =l 
for each a, y), and |y?) is the final state of the algorithm on input zx before the final measurement. 
Now if the algorithm computes the Boolean function f with success probability 2/3 on every input, 
then the final measurement must accept (i.e., output 1) every x € f~'(0) with probability < 1/3, 
and must accept every y € f~!(1) with probability > 2/3. This means the two states |W!) and 
on ) cannot be too close together, or equivalently their inner product cannot be too close to 1. 
Specifically, we must have bz ee) | Š H.T This suggests that we find a set R C f~!(0) x f-'(1) 
of hard-to-distinguish (x, y)-pairs, and consider the following progress measure 


S= So [wiley 


(x,y)ER 


as a function of t. By our observations, initially we have So = |R|, and in the end we must have 
Sr < HJR]. Also, crucially, the progress measure is unaffected by each application of a unitary U;, 
since each U; is independent of the input and unitary transformations preserve inner products. 

If we can determine an upper bound A on the change |St+1 — S;| in the progress measure at 
each step, we can conclude that the number T of queries is at least EL Ambainis proved the 
following. Suppose that 


(i) each z € f~'(0) appearing in R, appears at least mo times in pairs (x,y) in R; 
(ii) each y € f71(1) appearing in R, appears at least mı times in pairs (x,y) in R; 


(iii) for each x € f~1(0) and i € {0,..., N — 1}, there are at most lọ inputs y € f~!(1) such that 
(x,y) € Rand zi F yi; 


(iv) for each y € f~1(1) andi € {0,...,N — 1}, there are at most lı inputs x € f~!(0) such that 
(x,y) E€ Rand z; £ yi- 


Then for all t > 0, |Si41 — S| = O (/2 : A : |z) =: A. We will not prove this inequality 
here, though it is a reasonably straightforward generalization of the answer to Exercise 12, and 
we will see a stronger result in the next chapter. This upper bound A on the progress-per-query 
immediately implies a lower bound on the number of queries: 


mo mi 
T= Se a 11.1 
( ms ma) (11.1) 


"Remember Exercise 3 from Chapter 4 for states |¢) and |y): if ||¢ — w|| = £, then the total variation distance 
between the probability distributions you get from measuring |¢) and |Y), respectively, is at most e. Hence, if we 
know there is a two-outcome measurement that accepts |) with probability < 1/3 and accepts |) with probability 
> 2/3, then we must have total variation distance at least 1/3 and hence € > 1/3. Assume for simplicity that the 
inner product (¢|q) is real. Via the equation e? = ||¢— ||? = 2 — 2(¢|¥), this translates into an upper bound 
\(p|)| < 1 — e?/2 < 17/18 (this upper bound can be improved to 2\/2/3 with more careful analysis). 
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Intuitively, conditions (i)-(iv) imply that |St+1 — S;| is small relative to |R| by bounding the “dis- 
tinguishing ability” of any query. The art in applying this technique lies in choosing the relation R 
carefully to maximize this quantity, i.e., make mo and/or mı large, while keeping /ọ and ¢; small. 

Note that for the N-bit OR function this method easily gives the optimal O(/N ) lower bound, 
as follows. Choose R = {(x,y) : x = 0%,y has Hamming weight 1}. Then mp = N while mı = 
lo = 4 = 1. Plugging this into Eq. (11.1) gives the right Q(V N) bound. 

Let us give another application, a lower bound that is much harder to prove using the polynomial 
method. Suppose f : {0,1}% — {0,1} is a 2-level AND-OR tree, with N = k? input bits: f is the 
AND of k ORs, each of which has its own set of k inputs bits. By carefully doing 2 levels of Grover 
search (search for a subtree which is 0"), one can construct a quantum algorithm that computes f 
with small error probability and O(Vk- Vk) = O(N) queries. It was long an open problem to give 
a matching lower bound on the approximate degree, and this was proved only in 2013 [226, 76]. In 
contrast, the adversary method gives the optimal lower bound on the quantum query complexity 
quite easily: choose the relation R as follows 


R consists of those pairs (x, y) where 

x has one subtree with input 0% and the other k — 1 subtrees have an arbitrary k-bit 
input of Hamming weight 1 (note f(x) = 0) 

y is obtained from z by changing one of the bits of the 0*-subtree to 1 (note f(y) = 1). 


Then mp = m; = k and lo = 4 = 1, and we get a lower bound of Q (, Ea) = Q(k) = Q(V N). 
Another lower bound one can prove fairly easily using a strengthened version of the adversary 
method is for inverting a permutation, see Exercise 9. 


Exercises 


1. Consider a function f : {0,1} —> R. Show that this function can be represented by an 
N-variate multilinear polynomial of degree < N, and that this representation is unique. 


2. Consider a 2-bit input x = xox with phase-oracle O;,+ : |i) + (—1)®% |i). Write out the final 
state of the following 1-query quantum algorithm: HO, +H|0). Give a degree-2 polynomial 
p(xo, 21) that equals the probability that this algorithm outputs 1 on input x. What function 
does this algorithm compute? 


3. Consider polynomial p(xo, x1) = 0.3 + 0.429 + 0.521, which approximates the 2-bit OR func- 
tion. Write down the symmetrized polynomial (xo, 21) = $(p(%0, £1) + p(x1, £0)). Give a 
single-variate polynomial r such that q(x) = r(|z|) for all z € {0,1}. 


4. (H) Let f be the N-bit Parity function, which is 1 if its input x € {0,1} has odd Hamming 
weight, and 0 if the input has even Hamming weight (assume N is an even number). 


(a) Give a quantum algorithm that computes Parity with success probability 1 on every 
input x, using N/2 queries. 


(b) Show that this is optimal, even for quantum algorithms that have error probability < 1/3 
on every input 
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. Suppose we have a T-query quantum algorithm that computes the N-bit AND function with 
success probability 1 on all inputs x € {0,1}. In Section 11.2 we showed that such an 
algorithm has T > N/2 (we showed it for OR, but the same argument works for AND). 
Improve this lower bound to T > N. 


. Consider the following 3-bit function f : {0,1}° — {0,1}: 
f (x0, £1, £2) = 1 if zo = xı = x2, and f(z, 21, £2) = 0 otherwise 


(a) How many queries does a classical deterministic algorithm need to compute f? Explain 
your answer. 


(b) Give a quantum algorithm that computes f with success probability 1 using 2 queries. 


(c) (H) Show that 2 queries is optimal: there is no quantum algorithm that computes f 
with success probability 1 using only 1 query. 


. Let f be the N-bit Majority function, which is 1 if its input x € {0,1} has Hamming weight 
> N/2, and 0 if the input has Hamming weight < N/2 (assume N is even). 


(a) Prove that deg(f) > N/2. What does this imply for the query complexity of exact 
quantum algorithms that compute majority? 


(b) (H) Use the adversary method to show that every bounded-error quantum algorithm 
for computing Majority, needs Q(N) queries. Be explicit about what relation R you’re 
using, and about the values of the parameters mo, mı, lo, £1. 


. Let k be an odd natural number, N = k?, and define the Boolean function f : {0,1} — {0,1} 
as the k-bit majority of k separate k-bit OR functions. In other words, the N-bit input is 
x = z)... with c € {0,1}* for each i € [k], and f(x) is the majority value of the k 
bits OR(#™),..., OR(a™)). Use the adversary method to prove that computing this f with 
error probability < 1/3 requires Q(N*/4) quantum queries. Be explicit about what relation 
R yow’re using, and about the values of the parameters mo, m1, lo, £1. 


. This question is about the quantum complexity of inverting a permutation, which is an 
important problem in cryptography. Let N be a power of 2 and S = {0,...,N — 1}. Let 
x € SN correspond to a permutation on S, meaning that each j € S occurs exactly once as 
an entry of x (so the map i +> 2; is a permutation). Suppose we can query x, i.e., we have a 
unitary O, that maps |i, j) > |i, xi + j mod N) for all i,j € S, and we can also apply O;!. 


(a) Show how we can find the unique index i € S for which x; = 0, with success probability 
> 2/3, using O(V N) queries to Oy and O;!. 

(b) (H) The adversary lower bound of Section 11.3 still works with the following modifica- 
tions: 
(1) the x’s and y’s are not binary strings, but strings over a larger alphabet, such as S, 
and (2) let lz; be the number of y € Y such that (x,y) € R and a; Æ yi; ly; be the 
number of x € X such that (x,y) € R and z; Æ yi; and lmax = max{ly i+ lyi : (@,y) E€ 
R,i € {0,...,N—1},2; 4 yi}. 
In this case the quantum query lower bound is Q(,/mom,/émax). You may assume this 
without proof. 
Use this strengthened adversary bound to show a lower bound of Q(V N) quantum 
queries for computing the task of part (a). 
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10. 


11. 


12. 


13. 


(H) Consider the sorting problem: there are N numbers aj,...,@ and we want to sort these. 
We can only access the numbers by making comparisons. A comparison is similar to a black- 
box query: it takes 2 indices 7,7 as input and outputs whether a; < aj or not. The output 
of a sorting algorithm should be the list of N indices, sorted in increasing order. It is known 
that for classical computers, N log(N) + O(N) comparisons are necessary and sufficient for 
sorting. Prove that a quantum algorithm needs at least Q(N) comparisons for sorting, even 
if it is allowed an error probability < 1/3. 


Consider a total Boolean function f : {0,1}. — {0,1}. Given an input x € {0,1} and 
subset B C {0,...,.N — 1} of indices of variables, let x? denote the N-bit input obtained 
from x by flipping all bits z; whose index i is in B. The block sensitivity bs(f,x) of f at 
input x, is the maximal integer k such that there exist disjoint sets B,,...,B, satisfying 
f(x) Æ f(x?) for all i € [k]. The block sensitivity bs(f) of f is max, bs(f, 2). 


(a) (H) Show that the bounded-error quantum query complexity of f is Q(,/bs(f)). 


(b) It is known that for every total Boolean function f, there is a classical deterministic 
algorithm that computes it using O(bs(f)?) many queries. What can you conclude 
from this and part (a) about the relation between deterministic and quantum query 
complexity for total functions? 


(H) In this exercise we will derive the quantum lower bound for the search problem in a self- 
contained way, without using the polynomial or adversary method (this exercise uses what is 
called the “hybrid method”). 


Let N = 2”. Consider an input x € {0,1} that we can query. Assume z has Hamming 
weight 0 or 1, and suppose we would like to find the unique solution to the search prob- 
lem (if a solution exists). Let A be any T-query quantum algorithm for this. Suppose for 
simplicity that the algorithm acts on only n qubits (so there are no auxiliary qubits), and 
A = U7Oz,4Ur_107,4++-U10zr,+U0, so A interleaves phase-queries to x and unitaries that 
are independent of x. The initial state is |0"). Let |!) denote the n-qubit state right after 
applying U+, when we run A on input 2, so the final state is |y7). Let e; € {0,1} be the 
input that has a 1 only at position i. Assume the algorithm A is successful in finding the right 
solution i after T queries in the following sense: |||72,) — |i)|| < 1/4 and lean) — ji) || > 3/4 
for alli € {0,..., N — 1} (note that the basic Grover algorithm is an example of such an A). 


(a 


Consider the run of algorithm A on input 2 = 0%, and for t € {0,...,7 — 1} let the 
amplitudes az; be such that |Yġn} = ae atili). 
Prove that biw) — [Wt || < 2ļao,:l, for all i € {0,..., N — 1}. 


(b) Prove that IAS -= |¥Z)|| < 25 laz,|, for all ¿ € {0,...,N — 1}. 
(c) Prove that 1/2 < loin) — |pZ)||, for all ¿ € {0,..., N — 1}. 
(d) Prove that T > VN/4. 


NS 


Consider a standard quantum query algorithm: it makes T queries to a string x € {0,1}%, 
with arbitrary unitaries Up,Ui,...,Ur (that are independent of x) around the queries, and 
then measures a POVM {M,J — M} on the final m-qubit state |W). 
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(a) 


(b) 


Show that the probability P(x) of getting the first measurement outcome (on input x) 
is (W2|M|wz), and that this can be written as an N-variate multilinear polynomial in 
the bits of x of degree < 2T. 


(H) A k-wise independent distribution D is a probability distribution over {0,1}%, such 
that for each set S C [N] of at most k coordinates, the distribution on the k-bit substring 
zg = (a;)ics is uniformly random (i.e, for each z € {0,1}*, the probability under 
distribution D of the event that xg = z, is 1/2*). 

Show that a T-query quantum algorithm cannot distinguish the uniform distribution 
U on its input x from a 27-wise independent distribution D on x, in the sense that 
no matter what binary measurement the algorithm does at the end, the probability of 
output 1 is the same under U and under D. 
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Chapter 12 


Quantum Algorithms from the 
Generalized Adversary Bound 


12.1 The generalized adversary bound 


In the previous chapter we saw two different lower bound methods for the quantum query complexity 
of a given function f: the polynomial method and the adversary method. Neither is optimal for 
every possible f. For example, the polynomial method doesn’t give optimal lower bounds for 
iterations of some small functions [13] (see Exercise 1), while the adversary bound of Section 11.3 
cannot prove optimal lower bounds for instance for distinguishing 2-to-1 from 1-to-1 inputs. 

In this chapter we will look at a stronger version of the adversary bound, which turns out to 
give optimal quantum query complexity lower bounds for all Boolean functions. The beauty of 
an optimal lower bound method is that it can also produce algorithms: if the best-possible lower 
bound on the query complexity of f is T, then there must actually exist a T-query algorithm for f! 

Suppose f : D —> {0,1}, with D C {0,1}, is a Boolean function whose quantum query complex- 
ity we’d like to determine.! Consider a T-query quantum algorithm A = UrO,U7_1---U,O,Up, 
with initial state |0}, that computes f with error probability < e < 1/2 for each x € D. Let |W) 
denote the algorithm’s state after U, has been applied, given input x. Note that |w°) = Uo|0™) is 
independent of x. The crucial property, already used in the earlier version of the adversary bound, 
is that (i |q7,) is 1 at the start (t = 0), but has to be small at the end (t = T) for every (a, y)-pair 
with different function values f(x) 4 f(y). 

The generalized adversary matrix puts weights a, € C on the inputs x € D, with the constraint 
Jep |@2|* for normalization. It also puts real (but possibly negative!) weights Tsy on (x, y)-pairs 
with different function values. We impose the constraint that Try = Fyz, and Fey = 0 whenever 
f(x) = f(y). A |D| x |D| matrix T with these properties is called an adversary matriz. 

Let us use these weights to define a progress measure:” 


S= So Taaza (be lyy)- 


xz, yeD 


In the same spirit as Section 11.3, we will show that |So| is large, that |Sr| is much smaller, and 


Tf the domain is D = {0, 1}% then f is called a total function, otherwise it’s a partial function. 
2We could have just absorbed the ajay into the [zy and dispensed with the a,’s altogether, but it will be cleaner 
to have separate weights on the x’s and separate weights [,, on the pairs. 
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that S;41 can’t be too different from S; (i.e., S; can’t change very fast if we spend one more query). 
This will give the lower bound on T. 

At the start of the algorithm (t = 0), before any queries have been made, we have (7)9|y9) = 1 
for all x, y and hence Sp = eer DP pya,.ay = a*Ta. Since a is restricted to a unit vector, the biggest 
we can make |So| is 

ISo] = I'l, 


the operator norm (largest singular value) of T, by choosing a to be an eigenvector of I correspond- 
ing to the largest eigenvalue in absolute value. 

At the end of the algorithm (t = T), the final states |W7) and DA ) must be distinguishable 
with success probability > 1 — e whenever f(x) 4 f(y). The following claim, proved in Exercise 2, 
shows that this forces |Sr| to be significantly smaller than |Sol]. 


Claim 1 |Sr| < 2ye(1 — e) ||I||. 


For example, if our algorithm has error probability € = 1/3, then |Sr| < 0.95 ||I'||. Accordingly, the 
progress measure has to change significantly in the course of the T-query algorithm. How much can 
one more query change S;? This is upper bounded by the following claim, proved in Exercise 3. 


Claim 2 Let I; denote the |D| x |D| matrix obtained from T by setting Try to 0 if x; = yi. For all 
te {0, rere T = 1} we have |S: = St+1l < 2 maxen] IIT]. 


These two claims, together with our value for the initial |Soļ, imply 


T-1 T=1 
(1- 2v- 8)) IFI < |Sol- 1Srl < S0- Srl = | D S-S 2 2 ISe— Seal < 2T max [Til 


We get the following lower bound on quantum query complexity, due to Høyer, Lee, and Špalek [139]: 


Theorem 2 (Generalized adversary bound) Let f : D > {0,1}, with D C {0,1}, and T 
be an adversary matrix for f. Every quantum algorithm that computes f with worst-case error 


1 T 
probability < £, needs at least 5 e(l =J) ITI ~ queries. 


As an example, let us (again) prove the Q(V N) lower bound for search. Consider the domain 
D = {0% ,e1,...,en} of inputs of weight 0 or 1 (ej is the N-bit string that has a 1 only at position j). 
Define (N + 1) x (N +1) adversary matrix 


0 1. 1 

1 0. 0 
r= . 

1 0 0 


Then ||I'\| > ||T(1,0,...,0)"]| = VN. Each T; is the 2 x 2 X-matrix padded with extra rows and 
columns of 0s, so ||P;|| = 1. Hence we obtain the familiar lower bound of Q(V N). 

More generally, we can recover the lower bound of Section 11.3 by constructing an appropriate 
adversary matrix I based on the relation R. The rows and columns of I are indexed by D = 
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f-1(0)U f-1(1). Define ds = {y : (x,y) € R}| and similarly define dy. Define Fy = 1/,/dzdy if 
(x,y) € Ror (y,x) € R, and Tz, = 0 otherwise. Let v be the vector with entries dz, and note 


that |lv||? = |R|. We have |||] > wee Te = Ta (ayer 1 = 1. We can also show ||Fi|| < Vion 
for all i. Now Theorem 2 gives the Q (, [zara ) bound of Eq. (11.1). 
Of1 


12.2 The dual of the generalized adversary bound 


Consider the best-possible lower bound that we can obtain by optimizing over adversary matrix T: 


max |I| 
st. |G] <1 Vi € [N] 
T is symmetric 
TzS Vz, y € D with f(x) = f(y) 


We will call the optimal value the generalized adversary bound for f (a.k.a. the negative-weights 
adversary bound), and denote it by ADV~(f). Because the statement ||I;|| < 1 is equivalent to 
the matrix inequality —I < T; < J, the above maximization problem can be written in the form 
of a so-called semidefinite program: an optimization problem over real-valued variables, typically 
arranged in one or more matrices, with an objective function that’s linear in the variables, and 
psd constraints that are linear in the variables as well. Every maximization-SDP has an associated 
minimization-SDP which (under mild assumptions that hold in our case) has the same optimal 
value. The first SDP is called the primal SDP, the second is called the dual SDP.4 The equality of 
the optimal values of these two SDPs is called strong duality (see Exercise 4 for a proof of the easy 
half of this equality). This generalizes the better-known strong duality of linear programs, which 
correspond to SDPs with diagonal matrices. 

With some effort that we will skip here, one can show that the dual of the above maximization- 
SDP can be written as the following minimization-SDP: 


min max max Ss" luzi? bp luz 
xED oe A 


je [N] JE [N] 
st. JO (uejleys) = [f(@) 4 f(y) Va,y €D 
PUG FY; 
The truth-value [f(x) A f(y)] is 1 if f(x) A f(y), and 0 if f(x) = f(y). This SDP associates with 


every x € D and every j € [N] two vectors uz; and vz; (of some dimension d that will implicitly be 
optimized over). We can write this more explicitly as an optimization problem over psd matrices 


3Define two |D| x |D| matrices A,B by Asy = 1/Vdz if x; A yi and ((x,y) € R or (y,x) € R), and Asy = 0 
otherwise; and Bry = 1/./dy if xi A yi and ((x,y) € R or (y, x) € R), and Bay = 0 otherwise. Note that T; = Ao B, 
where ‘o’ denotes the entrywise product. For matrices with nonnegative entries, it is known [231, Appendix A] that 
||Ao B|| is at most the largest norm among the rows of A (which is < \/o/mo, because each row of A has at most 
Lo nonzero entries, each of which is at most 1/,/mo) times the largest norm among the columns of B (which is 


< J/f:/m1). Hence ||Ti|| < Va 
“Strictly speaking the maximum in the primal should be a supremum and the minimum in the dual should be an 


infimum, because the optimal value need not always be attained; it could be that there’s only an infinite sequence of 
feasible solutions whose objective values converge to the optimum without ever reaching it. 
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by defining, for each j € [N], a 2|D| x 2|D| psd matrix Z; whose entries are given by the pairwise 
inner products of the 2|D| vectors uz;, vz; (these vectors are the “Gram vectors” of Z;). Then the 
optimization is over psd matrices Z1,..., Zyn, the objective function is the largest diagonal entry 
of the matrix )> je[N] Zj, and the constraints are linear functions of the entries of the Z;’s. 


By strong duality, the optimal value of this minimization-SDP is ADV~(f) as well. A feasible so- 
lution I to the primal gives a lower bound on ADV~(f), while a feasible solution {uzj, Urj}xeD je[N] 
for the dual gives an upper bound on ADV~(f). The central result of this chapter is that ADV~(f) 
is not only a lower bound on the quantum query complexity of f (which follows from Theorem 2) 
but also an upper bound, as we will see in the next section. This means that a feasible solution to 
the dual SDP actually gives us an algorithm for f! 


12.3 ADV~ is an upper bound on quantum query complexity 


In this section we will construct a bounded-error quantum algorithm for computing f, derived from 
a feasible solution {uzj, Urj}2eD,je[n] Of the dual SDP for ADV*(f).° Let’s say the objective value 
of this feasible solution is A; the query complexity of our algorithm will turn out to be O(A). Below 
x,y always range over D, and j always ranges over [N]. 

Our algorithm will act on 3 registers. The first register is spanned by |j}, j € [N], the second 
is 1 qubit, and the third contains the states |v,;) € span{|1),...,|d)} and the special state |0) (so 
the third register has [log(d + 1)] qubits). For each x, define the following two 3-register states: 


1 
v2 


The algorithm starts with the all-0 state 


lt) = 5 (10)10)10) + |1)|f(@))|0)) and ltz) = 5 (10)10)10) — [1)LF@))10). 


1 
v2 


The goal of the algorithm is to (approximately) multiply |t} with +1 and |tz} with —1, which 
rather magically gives a final state that tells us f(z): 


1 = = 
gl) — ltz) = [f (æ))10). 


The key will be to use phase estimation (Section 4.6) with a well-chosen unitary U, that depends 
on x, to distinguish |t7) and |tz}. Define Uz = (211, — I)(2A — I) as the product of two reflections: 


0.02 
e Consider (unnormalized) states |q,) = Valu — `> 17) U5) [vyz), where yj = 1 — yj. 
j 


|0)|0)}0) = = (Itz) + Itz )). 


Note that |||~y)|| < A + VA, using triangle inequality and the fact that } lvyj||? < A. Let 
A be the projector on the subspace that is orthogonal to the span of these |w,)’s, and 2A — T 
be the reflection through this subspace. In other words, the unitary 2A — J puts a — in front 
of all |w,)’s and leaves states alone if they are orthogonal to all |W,)’s. This reflection costs 
no queries to implement, since it doesn’t depend on the actual input zx. 


°Our presentation follows the approach of [164], modified and simplified for our special case of computing Boolean 
functions rather than their more general case of state-transformation. For other generalizations of the adversary bound 
to different scenarios, see [44]. Reichardt’s first proof of the optimality of the generalized adversary bound [205, 207] 
went through so-called “span programs,” but we won’t need those here. 
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e Let II, be the projector on the subspace spanned by states that have |j)|;) in their first two 
registers (with arbitrary states in the third register) and by states having |0) in their third 
register. Then the reflection 2H, —J through this subspace puts a — in front of states |7)|%7)|v) 
if (v|O) = 0, and leaves the states alone that are in the subspace of II+. This reflection can 
be implemented with 1 query to x (Exercise 5). 


We now relate |t}) and |tz) to the eigenstates of this unitary U,. In the next two claims, the 
informal “close” should be read as “within small constant Euclidean distance.” 


Claim 3 |t{) is close to an eigenstate |¢) of Ux that has eigenvalue 1 (i.e., phase 0). 


Proof. Define 0.01 
l) = le) + a > 7) |@5) |Uaj)- 


The second term on the right-hand side has norm < 0.01 because > 7, luz;ll? < A, so |t+) is indeed 
close to |¢) (|¢)’s norm is close to but not equal to 1, but this doesn’t matter). 
Note that (|) = 0 for all y, because (tf |t, ) = sif(z) 4 f(y)] and 


Dd La) as) lees) | XID v) = X iel) uele) = X (uaj) = (E) # FO), 
j j j FRAY; 

and because |t7), |tz) have |0) in the third register and so are orthogonal to all |j)|b)|uz;), |j)|b)|va;), 

b € {0,1}. This shows that |) lies in the subspace of A, so it is an eigenvalue-1 eigenvector of 2A— T. 

Also, |) is a linear combination of |t/) (which has |0) in its third register) and states that have 
|j)|~;) in their first two registers, so |@) lies in the subspace of II, and hence is also an eigenvalue-1 

eigenvector of 2I, — I. Hence |¢) is an eigenvalue-1 eigenvector of Uy = (2, — I)(2A — I). 


Claim 4 |tz) is close to a superposition of eigenstates of Uy with eigenvalues of the form e with 
0 € (—r, 7] and |0| > © = 1/(1000A). 


Proof. Let {|(G)} be a complete orthonormal set of eigenvectors of Up, with respective eigenvalues 
e8, 03 € (—n,7]. Let Po = 2 8:/8|<e |3)(8| be the projector on the eigenvectors with small 


eigenphase. Define vectors w = 100V Aļyr) and v = Pell,w = Pe|t;). Our goal is to show that v 
has small norm. 
To that end, define v’ = (2A — I)v and v” = (2Il; — I)v’ = Uz, and note that 


2 


le- =|] 32 a- ea = SO -e-e < 2(1—cos ®)|lo|? < ©? [oI]. 


B:|03|<O B:|8g|<O 


Because v + v’ = 2Av, the vector v + v’ lies in the subspace corresponding to A and hence is 
orthogonal to |w,) and to w = 100VA|w,). We then have 


0 = (v + vw) = (v + vlw) + (w +v — He)|w) = (v + v" lw) + (v — vE — Te) |w), 
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where the last equality used that v’ +v” = 2II,v’ (hence I], (v’ + uv”) = 2II,v’ and so Hyv’ = Hgv”) 
and v’ — v” = 2(I —II,)v’ (hence (I — Iz)v’ = —(J — II,)v”). We can now upper bound ||v|| by 


lol? = |(v|PoLLe|w)| = | (vE w) 


1 1 
gly v” [Ha lw) + (v + v” ilw) = 5 |v — 0" | (21s — Flew) | 


A 


1 1 
lle- oll - ell < 39lvli 100A 


S Tool ova (22 va) = C Fro) i 


where the first inequality is Cauchy-Schwarz. This implies ||v|| < 1/20 + 1/(1000A), which is small 
(we may assume A > 1). 


Phase estimation with precision ©/2 can distinguish between these two cases (eigenphase 0 vs 
> ©). This allows us to put a + in front of |t}} and a — in front of |tz): run phase estimation, 
multiply with —1 whenever the absolute value of the phase estimate is > ©/2, and then invert 
the phase estimation. Phase estimation uses O(1/0) = O(A) applications of U,, and hence O(A) 
queries to x, as promised. There are small errors in this process due to the fact that Claims 3 and 4 
say “close to” rather than “equal,” and due to the small approximation errors of phase estimation. 
Accordingly, our final state will be close to |1)|f(x))|0) but necessarily equal to it, and we end up 
with an O(A)-query quantum algorithm for f that has a small error probability. 

It should be noted that the upper bound is on the algorithm’s query complexity, not on its 
gate complexity. The number of gates of the algorithm is O(A) times the number of gates needed 
to implement the reflection 2A — I (the reflection 2II, — I is relatively easy to implement, see 
Exercise 5). In general this number of gates could be very large, though in some cases it can be 
made quite small, for instance [17]. 


12.4 Applications 


Let us see how we can derive an O(VN )-query quantum algorithm for the N-bit OR function from 
the dual adversary, for the special case where the N-bit input x is promised to have at most one 
1-bit. Consider the set D = {0%,e1,...,en} of possible inputs. For the dual adversary bound 
we need to choose vectors Uz;,V2j for each x € D and j € [N]. Here vectors of dimension 1 (i.e., 
numbers) already suffice: we define ugv; = von; = = NV" for all j, and ue,j =e = N"4 if 
J= k and Ue,j = Ve,j = 0 otherwise. For the objective function, note that for each x, >), Ue = 

j v2 j = VN; for x = ON this is because each of the N j’s contributes (1/N 1/4)? = LVN to the 


sum, while for x = ep there is only one nonzero contribution, namely (N14)? = VN for j = k. It 
is also easy to verify that )))..,4y, UajYyj = [f (x) # f(y)] for all x,y € D. 

A number of new quantum algorithms have been derived from the dual SDP for ADV~, for 
instance for finding k-collisions [40], learning and testing “juntas” (functions that only depend on 
few coordinates) [43, 17], st-connectivity in graphs [46], and formula evaluation [208, 18]. In general 
it is often quite hard and non-intuitive to come up with a feasible solution {uz;,vz;} for the dual 
SDP with a small objective value, but Belovs’s learning graphs [41] can sometimes help with more 
intuitive constructions of feasible solutions. 
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12.5 Perfect composition and AND-OR trees 


If we have two Boolean functions f : {0,1}” — {0,1} and g : {0,1} — {0,1}, then we can define 
the function F = f o g” (on N = nm bits) which is their composition, as 


F(xt,..., £”) = f(g(x), e492") 
where each x* € {0,1}. One beautiful property of the adversary bound is perfect composition: 
ADV*(F’) = ADV~(f)ADV~(g). 


There are no hidden constant factors here! The upper bound ADV~(F) < ADV~(f)ADV~(g) 
of this composition property can be proved by combining feasible solutions for the dual SDPs for 
f and g to a feasible solution for the dual SDP for F. Similarly, the lower bound ADV~(F’) > 
ADV~(f)ADV~(g) can be proved by combining feasible solutions for the primal SDPs for f and g. 
We will skip the rather technical details, see [45] for proof of a stronger and more general result. 

Because of the optimality of the generalized adversary bound, it follows from the composition 
property that the quantum query complexity of F equals the product of the query complexities of f 
and g, up to aconstant factor. For example, for the 2-level AND-OR tree on N = k? bits mentioned 
at the end of Chapter 11, we immediately get an optimal O(VN )-query quantum algorithm from the 
fact that the k-bit AND and OR functions each have quantum query complexity (and hence ADV~ 
equal to @(Vk). Note that we are not directly composing bounded-error algorithms here (e.g., 
trying to put Grover on top of another Grover to compute the AND-OR tree): reasoning about the 
composed function at the level of the adversary bound and then only translating to quantum query 
algorithms at the end, cleanly circumvents the problem of how the error probabilities of composed 
bounded-error algorithms for f and g affect the error probability of the resulting algorithm for F. 

This composition result, used in a more subtle way, can also give a quantum speed-up for 
evaluating game trees. Imagine a two-player game, such as chess. First white chooses one of several 
possible moves, then black chooses one of several moves, etc. We can picture this as a tree where 
the root is the initial position and the leaves are the final positions (which are win, lose, or draw). 
If we assign a binary value to each leaf indicating whether white wins, then the evaluation of the 
game as a whole is a large, multilevel, unbalanced, AND-OR tree. If it’s white’s turn and at least 
one subtree evaluates to 1, then the current position is a 1 as well: there is a winning move for white 
(this corresponds to an OR function). If it’s black’s turn and one of the subtrees from the current 
position is labeled 0 then the current position is also labeled 0 because black has a non-losing move 
(this is an AND function). The value at the root of the tree indicates whether white has a sequence 
of moves guaranteed to win or not. Using the adversary bound to do a more subtle AND-OR 
composition, there is a quantum algorithm that evaluates this tree using roughly VN queries to 
the binary values at the leaves [18, 206]. In contrast, a classical algorithm has to evaluate nearly 
all N leaves in the worst case unless the fan-out of the tree is very small [215]. 


SS 


Exercises 


1. Consider the symmetric 4-bit Boolean function g(xo, £1, £2, £3) which is 1 iff the 4-bit input x 
is increasing or decreasing, i.e., if z € {0000, 0001, 0011, 0111, 1111, 1110, 1100, 1000}. Let f 
be the function on N = 4% input bits obtained by composing g with itself d times, in a tree 
of depth d, where the value of each internal node is obtained by applying g to the values of 
its 4 children, and with the input bits at the N leaves. 
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(a) Show that deg(g) < 2 and deg(f) < 2%. 
(b) Show that the polynomial method cannot prove a lower bound better than O(V N) on 
the bounded-error quantum query complexity of f. 


(c) It is known that ADV~(g) > 2.51... (see [42, Example 3.3] for a proof, or you could 
use an SDP-solver). Use this to show that ADV~(f) > N° for some c > 1/2. 


Comment: This exercise shows that the generalized adversary bound can sometimes prove substantially 
stronger lower bounds than the polynomial method. Incidentally, [42, Example 3.3] also shows that if we 
restrict the adversary matrix to have nonnegative entries, then the best adversary lower bound we can prove 


for g is 2.5, and hence the best lower bound we can prove for the quantum query complexity f with such 


restricted T, is at most (2.5%). So this example also shows that the “negative weights” ADV~ can give 


substantially better bounds than the “nonnegative weights” version of the adversary bound. 
. (H) This exercise justifies Claim 1. Below, the x,y always range over D. 


(a) Let Pı, Po denote the projectors on the subspaces corresponding to outputs 1 and 0, 
respectively. Suppose inputs x,y have f(x) 4 f(y). 
Show that (ily) = (WalPrey* Pigs ly) + (Hal Pr_pre) * Pry by) for all t, x,y. 

(b) Define unnormalized states |¢) = 0, a2Pp(a)|v2)|z) and |t) = 0, ax Pi pay |W2) |x). 
Show that IIA)? + I[]O-)||? =1, JIH? < e, and Io [lol] < Vee). 

(c) Show that |Sr| < 2|(¢|(I @T)|¢+)}. 

(d) Show that |S7| < 2,/e(1 = 8) |E ||. 


. (H) This exercise justifies Claim 2. It will be convenient for the proof to assume a phase- 
oracle, so the T-query algorithm will be of the form UrOgr +Ur-1-:U10zr,+U0, applied to 
initial state |0} and followed by a measurement of the first qubit to produce the output bit. 
Below, the x,y always range over D and the i ranges over [N]. 


(a) Show that St — Si41 = Jory Peyay dy (Wh|(I — Oz, Oy) |¥). 

(b) Let P; = |i) (i| & I be the projector on the space where the query register is |i). Define 
If) = Dy OP; |) x). Show that D>; |||4%) |]? = 1. 

(c) Show that S; — Si41 = 255,(¢:|(Z © Ts) | di). 

(d) Show that |S; — S:41| < 2 max; ||T;||. 


. (H) The following is a primal-dual pair of SDPs in so-called standard form: 


min Tr(CX) max bly 
s.t. Tr(A;X) =b; Vie [m] s.t. > yA < C 
X>0 
The input here consists of Hermitian n x n matrices C, A1,..., Am and vector b € R™. The 


n x n matrix X is the variable of the primal, and the vector y € R™ is the variable of the 
dual. Prove that “weak duality” always holds: for every feasible solution X for the primal 
and every feasible solution y for the dual, we have Tr(CX) > bfy. 


. Show how the reflection 2I, — J can be implemented with 1 phase-query to x, a Z-gate, and 
a circuit that decides if the third register is |0}. 


106 


Chapter 13 


Quantum Complexity Theory 


13.1 Most functions need exponentially many gates 


As we have seen, quantum computers seem to provide enormous speed-ups for problems like fac- 
toring, and square-root speed-ups for various search-related problems. Could they be used to 
significantly speed up all or almost all problems? Here we will show that this is not the case: 
quantum computers are not significantly better than classical computers for most problems. 

Consider the problem of computing a Boolean function f : {0,1}" — {0,1} by means of a 
quantum circuit. Ideally, most such functions would be computable by efficient quantum circuits 
(i.e., using at most poly(n) elementary gates). Instead, we will show by means of a simple counting 
argument that almost all such functions f have circuit complexity nearly 2”. This is a variant of a 
well-known counting argument for classical Boolean circuits due to Riordan and Shannon [209]. 

Let us fix some finite set of elementary gates, for instance the Shor basis {H,T, CNOT} or 
{H, Toffoli}. Suppose this set has k types of gates, of maximal fanout 3. Let us try to count the 
number of distinct circuits that have at most C elementary gates. For simplicity we include the 
initial qubits (the n input bits as well as workspace qubits, which are initially |0)) as a (k + 1)st 
type among those C gates. First we need to choose which type of elementary gate each of the C 
gates is; this can be done in (k +1)° ways. Now every gate has at most 3 ingoing and 3 outgoing 
wires. For each of its 3 outgoing wires we can choose an ingoing wire into one of the gates in the 
following level; this can be done in at most (3C)? ways. Hence the total number of circuits with 
up to C elementary gates is at most (k + 1)°(3C)8° = CO). We are clearly overcounting here, 
but that’s OK because we want an upper bound on the number of circuits. 

We'll say that a specific circuit computes a Boolean function f : {0,1}" — {0,1} if for every 
input x € {0,1}”", a measurement of the first qubit of the final state (obtained by applying the circuit 
to initial state |z,0)) gives value f(x) with probability at least 2/3. Each of our CO) circuits 
can compute at most one f (in fact some of those circuits don’t compute any Boolean function 
at all). Accordingly, with C gates we can compute at most C°) distinct Boolean functions 
f : {0,1}" — {0,1}. Hence even if we just want to be able to compute 1% of all 2?” Boolean 
functions, then we already need 


1 


Cel) > T which implies C > Q(2”/n). 


Accordingly, very few computational problems will be efficiently solvable on a quantum computer. 
Below we will try to classify those using the tools of complexity theory. 
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13.2 Classical and quantum complexity classes 


A computational decision problem on binary strings corresponds to what is often called a “language” 
in complexity theory: a language L C {0,1}* is a set of binary strings of arbitrary lengths, and the 
corresponding decision problem is to determine whether a given string is an element of L or not. 
For example, L could be the set of prime numbers encoded in binary (corresponding to the problem 
of deciding whether a given number is prime or not) or the set of satisfiable Boolean formulas. It 
is often convenient to think of such a decision problem as corresponding to a sequence of Boolean 
functions fn : {0,1}" — {0,1}, one for each input length n, where fn takes value 1 exactly on the 
n-bit strings that are in L. 

A “complexity class” is a set of decision problems (i.e., languages) that all have similar com- 
plexity in some sense, for instance the ones that can be solved with polynomial time or polynomial 
space. Let us first mention four of the most important classical complexity classes: 


e P. The class of problems that can be solved by classical deterministic computers using 
polynomial time. 


e BPP. The problems that can be solved by classical randomized computers using polynomial 
time (and with error probability < 1/3 on every input). 


e NP. The problems where the ‘yes’-instances can be verified in polynomial time if some 
prover gives us a polynomial-length “witness.” More precisely, a language L is in NP iff 
there exists a deterministic polynomial-time algorithm A, with two inputs x, y (where y is at 
most polynomially longer than x), such that x € L iff there is a y such that A(x, y) outputs 1. 


Some problems L in this class are NP-complete, meaning that any other problem L’ € NP 
can be reduced to L in polynomial time: there exists a polynomial-time computable function 
f such that x € L’ iff f(x) € L. Hence the NP-complete problems are the hardest problems in 
NP. An example is the problem of satisfiability: we can verify that a given Boolean formula 
is satisfiable if a prover gives us a satisfying assignment y, so the satisfiability-problem is 
in NP, but one can even show that it is NP-complete. Other examples of NP-complete 
problems are integer linear programming, travelling salesman, graph-colorability, etc. 


e PSPACE. The problems that can be solved by classical deterministic computers using 
polynomial space. 


We can consider quantum analogues of all such classes, an enterprise that was started by Bernstein 
and Vazirani [53]: 


e EQP. The class of problems that can be solved exactly by quantum computers using poly- 
nomial time. This class depends on the set of elementary gates one allows, and therefore is 
not so interesting. 


e BQP. The class of problems that can be solved by quantum computers using polynomial time 
(and with error probability < 1/3 on every input). This class is the accepted formalization 
of “efficiently solvable by quantum computers.” 


e “quantum NP”. In analogy with the above definition of NP, one could define quantum NP 
as the class of problems where the ‘yes’-instances can be verified efficiently if some prover 
gives us a “quantum witness” of a polynomial number of qubits. For every ‘yes’-instance 
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there should be a quantum witness that passes the verification with probability 1, while for 
‘no’-instances every quantum witness should be rejected with probability 1. This class is 
again dependent on the elementary gates one allows, and not so interesting. 


Allowing error probability < 1/3 on every input, we get a class called QMA (“quantum 
Merlin-Arthur”). This is a more robust and more interesting quantum version of NP. In 
particular, like NP, QMA has complete problems: problems in QMA to which every other 
QMA-problem can be efficiently reduced. The most famous example of such a problem is de- 
ciding whether the ground state energy (i.e., lowest eigenvalue) of a given k-local Hamiltonian 
(see Chapter 9) is at most some given number a or at least a + 1/poly(n). Determining the 
ground state energy of a given physical system is extremely important in physics and chem- 
istry. It is not hard to see that the problem is in QMA: we can just let the quantum witness 
be the ground state (i.e., an eigenstate for the lowest eigenvalue) and measure its energy using 
the Hamiltonian, which is the observable corresponding to total energy. The problem turns 
out to be QMA-complete already for k = 2 [156, 149]. We will devote Chapter 14 to this. 


e QPSPACE. The problems that can be solved by quantum computers using polynomial 
space. This turns out to be the same as classical PSPACE. 


As explained in Appendix B.2, in all the above cases the error probability 1/3 can be reduced 
efficiently to much smaller constant € > 0: just run the computation O(log(1/e)) times and take 
the majority of the answers given by these runs. 

We should be a bit careful about what we mean by a “polynomial-time [or space] quantum 
algorithm.” Our model for computation has been quantum circuits, and we need a separate quan- 
tum circuit for each new input length. So a quantum algorithm of time p(n) would correspond to 
a family of quantum circuits {Cn}, where Cn is the circuit that is used for inputs of length n; it 
should have at most p(n) elementary gates.! 

We have BPP C BQP, because a BPP-machine on a fixed input length n can be written 
as a polynomial-size reversible circuit (i.e., consisting of Toffoli gates) that starts from a state 
that involves some coin flips. Quantum computers can generate those coin flips using Hadamard 
transforms, then run the reversible circuit, and measure the final answer bit. It is believed that 
BQP contains problems that aren’t in BPP, for example factoring large integers: this problem 
(or rather the decision-version thereof) is in BQP because of Shor’s algorithm, and is generally 
believed not to be in BPP. In the next section we will prove that BQP C PSPACE. Thus we 
get the following sequence of inclusions: 


P C BPP C BQP C PSPACE. 


It is generally believed that P = BPP [140], while the other inclusions are believed to be strict. 
Note that a proof that BQP is strictly greater than BPP (for instance, a proof that factoring 
cannot be solved efficiently by classical randomized computers) would imply that P 4 PSPACE, 
solving what has been one of the main open problems in computers science since the 1960s. Hence 
such a proof—if it exists at all—will probably be very hard. 


'To avoid smuggling loads of hard-to-compute information into this definition (e.g., Cn could contain information 
about whether the n-th Turing machine halts or not), we will require this family to be efficiently describable: there 
should be a classical Turing machine which, on input n and j, outputs (in time polynomial in n) the j-th elementary 
gate of Cn, with information about where its incoming and outcoming wires go. 
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What about the relation between BQP and NP? It’s generally believed that NP-complete 
problems are probably not in BQP. The main evidence for this is the lower bound for Grover 
search: a quantum brute-force search on all 2” possible assignments to an n-variable formula gives 
a square-root speed-up, but not more. This is of course not a proof, since there might be some 
more clever, non-brute-force methods that exploit the structure of the problem to solve satisfiability. 
However, neither in the classical nor in the quantum case do we know clever methods that solve 
the general satisfiability problem much faster than brute-force search. 

Finally, there could also be problems in BQP that are not in NP, so it may well be that BQP 
and NP are incomparable. Much more can be said about quantum complexity classes; see for 
instance Watrous’s survey [246]. 


13.3 Classically simulating quantum computers in polynomial space 


When Richard Feynman first came up with quantum computers [111], he motivated them by 


“the full description of quantum mechanics for a large system with R particles is given by 
a function q(£1, £2,..., ER, t) which we call the amplitude to find the particles 71,...,7R 
[RdW: think of x; as one qubit], and therefore, because it has too many variables, it 
cannot be simulated with a normal computer with a number of elements proportional 
to R or proportional to N.” [...] 

“Can a quantum system be probabilistically simulated by a classical (probabilistic, ’d 
assume) universal computer? In other words, a computer which will give the same 
probabilities as the quantum system does. If you take the computer to be the classical 
kind I’ve described so far (not the quantum kind described in the last section) and there 
are no changes in any laws, and there’s no hocus-pocus, the answer is certainly, No!” 


The suggestion to devise a quantum computer to simulate quantum physics is of course a brilliant 
one, but the main motivation is not quite accurate. As it turns out, it is not necessary to keep 
track of all (exponentially many) amplitudes in the state to classically simulate a quantum system. 
Here will prove the result of Bernstein and Vazirani [53] that quantum computers can actually be 
simulated efficiently in terms of space (though not necessarily in terms of time). 

Consider a circuit with T = poly(n) gates that acts on S qubits, where the first n of those 
S qubits give the classical input string. Assume for simplicity that all gates are either the 1- 
qubit Hadamard or the 3-qubit Toffoli gate (as mentioned in Section 2.2, these two gates together 
suffice for universal quantum computation), and that the classical output (0 or 1) of the algorithm 
is determined by a measurement of the first qubit of the final state. Without loss of generality 
S < 3T, because T Toffoli gates won't affect more than 3T qubits. Let U; be the unitary that 
applies the j-th gate to its (1 or 3) qubits, and applies identity to all other qubits. The entries of 
this matrix are of a simple form (0, 1/2, or —1/v2 for Hadamard; 0 or 1 for Toffoli) and easy to 
compute. Let |ig) = |x)|0°"") be the starting state, where x € {0,1}” is the classical input, and 
the second register contains the workspace qubits the algorithm uses. The final state will be 


lve) = UpUr-1:+: U>U1|io}. 


The amplitude of basis state |i7) in this final state is 
(irls) = (ip|UrpUr_1Up_2---U2Ui\t0). 
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Inserting an identity matrix J = je £0,1}5 |i) li| between the gates, we can rewrite this as? 


(ir|ve) = (ir|Ur `> lir-1)}ir-1| | Ur-1 > lir-2)}lir-2| | Ur_2--- U2 (= nt) Ui |x, 0) 
ir—1€{0,1}5 iT—2 
T 


`> J [(4lU;liz-1)- 


ip—1€{0,1}%,...,i1€ {0,1} 9 j=l 


ii 


The term (i;|U;|i;-1) is just one entry of the matrix U; and hence easy to calculate because U; acts 
non-trivially on only 1 or 3 qubits (see Exercise 2). Then ies) is also easy to compute, 
in polynomial space (and even in polynomial time). If £ of the T gates are Hadamards, then each 
such term is either 0 or +1/V2°. 


Adding up Malli for all ip_1,...,%1 is also easy to do in polynomial space if we 
reuse space for each new ir—1,...,t1. Hence the amplitude (i7|q,) can be computed exactly using 
polynomial space. We assume that the BQP machine’s answer is obtained by measuring the first 
qubit of the final state. Then its acceptance probability is the sum of squares of all amplitudes of 
basis states starting with a 1: } (ir) =1 |T We) |2. Since we can compute each amplitude (ir|pz) 
in polynomial space, and we can loop over all ip € {0,1}9 whose first bit is 1 to sum their squared 
amplitudes, the acceptance probability of a BQP-circuit on classical input x can also be computed 
in polynomial space. This proves the inclusion BQP C PSPACE. 


Exercises 
1. (H) The following problem is a decision version of the factoring problem: 
Given positive integers N and k, decide if N has a prime factor p € {k,..., N — 1}. 


Show that if you can solve this decision problem efficiently (i.e., in time polynomial in the 
input length n = [log N|), then you can also find the prime factors of N efficiently. 


2. (a) Let U be an S-qubit unitary which applies a Hadamard gate to the k-th qubit, and 
identity gates to the other S — 1 qubits. Let i,j € {0,1}°. Show an efficient way 
(i.e., using time polynomial in S) to classically calculate the matrix-entry U; ; = (ilU |j) 
(note: even though U is a tensor product of 2 x 2 matrices, it’s still a 2° x 2% matrix, 
so calculating U completely isn’t efficient). 


(b) Let U be an S-qubit unitary which applies a CNOT gate to the k-th and ¢-th qubits, 
and identity gates to the other § — 2 qubits. Let i,j € {0,1}°5. Show an efficient way to 
classically calculate the matrix-entry U; ; = (i/U|j). Here k and £ need not be adjacent, 
but you may assume that they are in order to simplify your notation. 


3. This exercise shows how to use BQP-algorithms as subroutines in other BQP-algorithms. 


For the physicists: this is very similar to a path integral. 
3Of course, the calculation will take exponential time, because there are 29(T-1) different sequences ?7-1,..., t1 
that we need to go over sequentially. 
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(a) 


(H) Suppose L is a language in BQP. Let f be the corresponding Boolean function, so 
f(x) = 1 iff e € L. Show that there is a w < poly(n) and a polynomial-size quantum 
circuit U that implements the following map for all x € {0,1}”: 


|z, 0°") > Vale, F£) lole) + V1 — plz, 1- f(z) lye), 


where p > 1 — exp(—7n), and |¢(x)) and |Y(x)) are states of the w-qubit workspace. 


Show that there is a polynomial-size quantum circuit V that (when restricted to the 
subspace where the workspace qubits are |0}) is exp(—n)-close in operator norm to the 
following unitary: 

OF : |x, b, 0”) — |x, b f(x), 0°), 


for all x € {0,1}” and b € {0,1}. 


(H) Suppose L is a language in BQP, and you have a polynomial-size quantum circuit 
for another language L’ that uses queries to the language L (i.e., applications of the 
unitary Oş). Show that the language L’ is also in BQP: there is a polynomial-size 
quantum circuit for L’ that doesn’t need queries to L. 


4. (H) Consider a circuit C with T = poly(n) elementary gates (only Hadamards and Toffolis) 
acting on S = poly(n) qubits. Suppose this circuit computes f : {0,1}" — {0,1} with 
bounded error probability: for every x € {0,1}", when we start with basis state |x, 097”), 
run the circuit and measure the first qubit, then the result equals f(x) with probability at 
least 2/3. 


(a) 


Consider the following quantum algorithm: start with basis state |2,0°-"), run the 
above circuit C without the final measurement, apply a Z gate to the first qubit, and 
reverse the circuit C. Denote the resulting final state by |W). Show that if f(x) = 0 
then the amplitude of basis state |z, 097”) in |z} is in the interval [1/3,1], while if 
f(a) = 1 then the amplitude of |z, 097”) in Ys) is in [-1, —1/3]. 

PP is the class of computational decision problems that can be solved by classical 
randomized polynomial-time computers with success probability > 1/2 (however, the 
success probability could be exponentially close to 1/2, i.e., PP is BPP without the ‘B’ 
for bounded-error). Show that BQP C PP. 
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Chapter 14 


QMA and the Local Hamiltonian 
Problem 


14.1 Quantum Merlin-Arthur (QMA) 


One can think of the complexity class NP as formalizing the standard notion of an efficiently- 
verifiable proof. For example, to prove that a given formula ¢ is satisfiable we give a satisfying 
assignment: this is easy to verify but hard to find (unless P=NP). The theory of NP-completeness 
shows that many very different computational problems (satisfiability, TSP, integer linear program- 
ming, etc.) are essentially the same computational problem, in the sense that instances from one 
can easily be translated to instances of another in deterministic polynomial time. 

We can relax the notion of proof slightly by allowing the prover and verifier the use of random- 
ness, and allowing them some error probability, say 1/3. For every x € L the prover should be able 
to provide a polynomial-size proof or “witness” that convinces the randomized verifier (with prob- 
ability > 2/3) that x € L; while for x ¢ L, no matter what purported “witness” the prover sends, 
the verifier should only accept with probability < 1/3. For historical reasons [34] the the com- 
plexity class corresponding to such L is called Merlin-Arthur (MA), with Merlin referring to the 
omniscient prover and Arthur referring to the mere mortal (i.e., randomized and polynomial-time) 
king who is supposed to verify Merlin’s proofs. 

Quantum Merlin-Arthur (QMA) is the proper quantum analogue of NP. We already mentioned 
it in the previous chapter, but let us formally define QMA here. In contrast to NP, which consists 
of languages L where every string x is either in or out of L, QMA is a class of promise problems. A 
promise problem L partitions the set {0,1}* of all binary strings into L1, Lo, and Lẹ. An algorithm 
is “promised” that it never receives inputs from L,; if it gets an input from Lp for b € {0,1}, then 
it has to determine b. The usual languages are promise problems where L, = @. 


Definition 1 A promise problem L = (Lı, Lo, Lx) is in the class QMA, if there exists a uniform 
family {Cn} of polynomial-size quantum circuits with two input registers and one output qubit, and 
a polynomial w (for “witness length”), such that for all x € {0,1}*: 


e Completeness: If x € Lı N {0,1}", then there exists a w(n)-qubit state |W) (a “proof” or 
“witness” state) such that the circuit Cn outputs 1 with probability > 2/3 when run on x, |W). 


e Soundness: /f x € Lo MN {0,1}", then for every w(n)-qubit state |Y), the circuit Cn outputs 1 
with probability < 1/3 when run on x, |Y). 
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If we force |y) to be classical, then we get a class called QCMA. If we additionally force the verifier 
to be classical, then we get MA. And if we additionally replace success probability 2/3 by 1, then 
we get NP.! Hence NP C MA C QCMA C QMA follows immediately from the definitions. It 
is believed that MA=NP [158], for similar reasons as why we believe BPP=P [140]. However, 
QMaA is strongly believed to be a larger class than NP, meaning that “quantum proofs” can 
prove more than classical proofs can. Moreover, as we will see later, we can identify important 
QMA-complete problems, none of which is in NP (unless NP=QMA). 


We have fixed the error probability here to 1/3 rather arbitrarily. We can easily reduce this to a 
much smaller ô like we would for a randomized algorithm (see Appendix B.2): repeat the protocol 
O(log(1/6)) times and output the majority output bit of those runs. If done naively, we’d need a 
new witness state |v) in each run, because the measurement that produces the verifier’s output bit 
in each run can collapse the state. One has to be careful about soundness here, since the prover can 
send a large entangled state instead of a tensor product of witness states for the individual runs; 
however, it is not too hard to show that this cannot help the prover. This approach increases the 
verifier’s runtime but also the required witness-size w(n) by a factor O(log(1/6)). However, there 
is a beautiful and quite surprising technique (which we will not explain here) due to Marriott and 
Watrous [187] that achieves the same error reduction using the same witness state! The verifier’s 
runtime in that amplified protocol will still go up by a factor O(log(1/6)), but the witness-size 
remains w(n). 


14.2 The local Hamiltonian problem 


The quintessential NP-complete problem is satisfiability (SAT): given a formula ¢(21,...,2,,) of 
n Boolean variables z1,..., £n”, decide if there is an assignment of truth values to z1,..., n that 
makes the formula true. The famous Cook-Levin theorem [92, 167] says this is NP-complete. 


A special case of this is k-SAT, where we restrict the formula ¢ to be the conjunction of clauses, 
each of which is the disjunction of k literals (a literal is a variable x; or its negation). For example, 
the following is a 3-SAT instance with 4 clauses on n = 5 Boolean variables: 


(x1 V 1% V x3) A (“rı V T2 V Xs) x (32 V 123 V =i) N (x3 V «v4 V 25). (14.1) 


It is well-known that k-SAT is still NP-complete if k > 3, while 2-SAT is actually in P. 


Let us try to reformula k-SAT in a way that looks “more quantum,” by relating it to the minimal 
eigenvalue of a particular Hamiltonian (recall from Section 9.1 that the Hamiltonian for a physical 
system is the observable corresponding to total energy) that is diagonal in the computational basis. 
For concreteness we will fix k = 3. Consider a clause C = #1 V7%2V x3. This has one non-satisfying 
assignment, namely zı = 0, £2 = 1, x3 = 0. With this clause let us associate the following diagonal 


1 As mentioned in the previous chapter, requiring success probability 1 instead of 2/3 in QMA itself (like in NP) 
leads to an unsatisfactory complexity class because it depends on what set of elementary gates one uses. 

?To remain consistent with the literature we now use n for the number of Boolean variables, not for the actual 
length of the input, i.e., the number of bits needed to describe the instance ¢. However, the latter should be at most 
polynomial in n. 
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Hamiltonian: 


0 
Hç = 


0 


Note that the 1 sits at the location indexed by 212223 = 010, the unique non-satisfying assignment 
for C. Think of Hç as giving a “penalty” of 1 to x if x doesn’t satisfy clause C. We will implicitly 
treat Ho as an n-qubit Hamiltonian, by tensoring it with identity for the other n — 3 qubits. That 
way, we have (z|Ho|x) = 0 if clause C is satisfied by assignment x, and (x|Hco|x) = 1 if not. 

Now suppose we have a 3-SAT formula ¢ = C1 A---A Cm that is the conjunction of m clauses, 
each with 3 literals. To this we associate the following Hamiltonian: 


m 
Hy = Wp He,. 
j=l 


Note that the eigenvalues of Hy lie in the interval [0, m], and that Hy is a 3-local Hamiltonian: each 
term involves only 3 of the qubits non-trivially (more generally, if we start with a k-SAT instance, 
then Hg is k-local). Also note that the “energy” of assignment x € {0,1}” is 


m 


(x|Hg|x) = S (z|Ho,|x), 
j=l 


which exactly counts the number of unsatisfied clauses under assignment x. The “minimal energy” 
(lowest eigenvalue Amin) of Hg is equal to the minimal number of unsatisfied clauses (= m—the 
maximal number of satisfied clauses). In particular, ¢ is satisfiable iff Amin = 0. 

The above Hamiltonian is diagonal, and hence rather “classical,” because it is the sum of 
diagonal 3-local terms. If instead we allow the terms to be arbitrary 3-local (or even k-local) 
Hamiltonians, we arrive at the central problem of this chapter. 


Definition 2 The k-local Hamiltonian problem is the following: given a classical description of 
an n-qubit Hamiltonian 


=> H, (14.2) 


where each H; is k-local (i.e., it acts nontrivially on only k of the n qubits) and 0 < H; < I, and 
given parameters a,b € [0,m] with b— a > 1/poly(n), promised that H’s minimal eigenvalue Amin 
is either <a or > b, decide which is the case. 


Note that this is a promise problem: some H of the form of Eq. (14.2) will have Amin in (a, b) 
and hence won’t satisfy the promise. Such instances will form the set L, of this promise problem, 
while the “< a” instances will form Lı and the “> b” instances will form Lo. The input H is 
a 2” x 2” matrix, but because it is k-local we do not need to describe it literally. Instead the 
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description of H that is given as input will just consist of each of the m terms as a 2” x 2" matrix 
(m-2?* complex numbers) and m/log (;)] bits telling us for each of the m terms on which k qubits 
that term acts non-trivially. Accordingly, if m = poly(n), k = O(logn), and each complex entry is 
represented with poly(n) bits, then the actual input length is poly(n) bits. 

The assumption that each H; (and therefore H as well) is positive semidefinite, is not essential. 
If we were instead to start with the weaker condition —I < H; < J, then by defining psd matrices 
H; = (H; + I)/2 we obtain an instance where 0 < H; < I with a simple relation between the 
minimal eigenvalues Amin of H = >); Hj and Anin of H’ = >), H; (namely Anin = (Amin + ™)/2). 
This allows us to also model negative energies. 

The minimal eigenvalue Amin is known as the “ground state energy” of the Hamiltonian: it is 
the energy of the state(s) we get if we cool an n-qubit system governed by H to temperature 0. 
Distinguishing Amin < a or Amin > b is essentially equivalent to approximating Amin up to an ad- 
ditive error O(b — a) (see Exercise 3). Finding information about Amin, and more generally about 
the lowest eigenvalues of H and about the structure of the “ground states” (the eigenstates for 
eigenvalue Amin) is important for many problems in physics and chemistry, for instance in deter- 
mining the properties of materials at low temperatures (including poorly-understood phenomena 
such as superconductivity) and the reaction speeds of chemical reactions. Much work in computa- 
tional science is expended on solving such problems for particular Hamiltonians corresponding to 
physical systems of interest. Such Hamiltonians are typically indeed k-local for small k, at least 
approximately, since particles tend to significantly influence only the particles close to them. 

Unfortunately the local Hamiltonian problem is NP-hard already for k > 2 and very large 
gap between a and b. This follows from our above translation from SAT: we can convert a 2-SAT 
instance ¢ to 2-local Hamiltonian Hg where Amin = m—the maximal number of satisfied clauses. 
Computing the maximal number of satisfied clauses is known as the MAX-2-SAT problem, and 
distinguishing between different values that are Q(m) apart for its value is already known to be 
an NP-hard problem [136]. Since it is generally believed that NP Z BQP, it is unlikely that 
a quantum computer can solve local Hamiltonian efficiently in general. Even worse, as we will 
see next, k-local Hamiltonian turns out to be complete for the complexity class QMA (which is 
presumably larger than NP), already for k > 2, but with polynomially small gap between a and b. 


14.3 Local Hamiltonian is QMA-complete 


In this section we will show that k-local Hamiltonian is QMA-complete, proving a quantum ana- 
logue of the Cook-Levin theorem. 

First, it is not too hard to see that the problem is in QMA. The witness |W) for the instances 
x € Lı would be a ground state, and there are several efficient ways to approximate its energy in 
order to verify that it is indeed < a and not > b (see Exercise 4). 

Second, we need to show that k-local Hamiltonian is QMA-hard, meaning that any other 
problem in QMA can be reduced to it. So consider an arbitrary L = (Lj, Lo, Lx) E€ QMA as in 
Definition 1, and fix an n-bit input x € Lı U Lo. We would like to convert the circuit Cn into a 
Hamiltonian H, such that H has a small eigenvalue iff Cn has high acceptance probability on some 
states |Y). We will assume the error probability is < 1/4T rather than 1/3. 

The circuit Cn acts on n+ s + w(n) qubits, where the first n qubits contain the fixed classical 
input x (which we will omit below for simplicity), the circuit uses s workspace qubits (which start 
out as |0)), and the third register contains the purported w(n)-qubit witness state. Since Chn 
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consists of some T = poly(n) gates, we can write it as a product Cn = Ur---U; where each U; 
is an elementary gate on 1 or 2 qubits, tensored with identity on the other qubits. For a given 
w(n)-qubit state |Y}, let jw) = |0*%)|a) and |v.) = Uz|v4-1) for t € [T] be the initial, intermediate, 
and final states of the algorithm. The output qubit of the circuit is obtained by measuring the first 
qubit of the final state |Yr) in the computational basis. 

We will now describe a Hamiltonian H that “follows” the state and gives “penalties” for every 
deviation from the proper sequence of states |wWo),...,|wr), as well as penalizing a 0-output in 
the final measurement. In addition to the register that U acts on, we will add another register 
of [log(Z + 1)] qubits that acts like a “clock,” ranging from 0 to T. We will subscript the first 
s + w(n) qubits by integers in {1,...,5-+ w(n)}, and subscript the clock register by ‘C’. Define 


Hiit = Sn )(1]i ® |0) Ole 
Hy = Fe (It I(t He + Ilo) - U: 8 |tt- llo — Uf 8 |t — 1)(tlc), for t € [T] 
gna = OOS Re 
H = Hint) H+ Hina 
t=1 


The number of terms in H is m = s+ T +1. The idea behind this Hamiltonian H is that Hinit 
checks that the s workspace qubits are all 0 in the initial state, where the clock register is 0 (giving 
an “energy penalty” if some of those workspace qubits are 1); H; checks that U; is applied properly 
in the t-th step (the factor 1/2 is to ensure H; < 1); and Hgya) checks that the output qubit in the 
final state is 1 (giving a penalty if it’s 0). Because the clock register uses |log(T + 1)| bits, and 
each gate U; acts on at most 2 qubits, the locality of H is k = [log(T + 1)| +2 = O(logn). We 
will reduce this to a constant later. 


14.3.1 Completeness and soundness 


We now want to show that we can distinguish x € Lı and x € Lo by considering the smallest 
eigenvalue Amin of the above Hamiltonian H. First, for completeness we want to show that if 
x € Lı then there is a state with small eigenvalue. Since x € Ly, there is a w(n)-qubit witness 
state |w) that leads U with initial state |W) = |0%)|w) to accept (i-e., output 1) with probability 
>1-1/4T. Consider the following state: 


This state is sometimes called the “history state” of the circuit U, and you can think of it as the 
quantum analogue of a satisfying assignment in classical SAT. It faithfully “follows” the intermedi- 
ate states of the computation. This means |y’) gets penalty 0 from Hinit and from each H;. Since 
the probability of getting (the incorrect) measurement outcome 0 is < 1/4T, we have 


= eye.) E 1) Win) a 


Amin < (VIH) = Pa VEE | Hana (r)|T) = S(T +1) | 


To 
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Second, to prove soundness we’ll show that if x € Lo, then Amin is at least b := 2a. Consider any 
purported witness state |’). We can write this as 


T 
= Y= ail ¢2)|t) 
t=0 


for some nonnegative reals a; and normalized states |¢;). Note that 


(YEY) = (ari (hiill — 1] oeli) He (ae—1]be-1) | — 1) + ae] Ge) lt)) 
= ; (al 1 + af — amiapl Uoi) — ailp lUi lo) 
= Toslói) — e41Uelbe-1) IP. (14.3) 


So intuitively, assuming a; © a4_1, the Hamiltonian term H; gives a penalty proportional to how 
much |¢;) deviates from U;|¢,—1), i.e., from a correct application of the t-th gate of Cy. 
For ease of presentation we will now make the following three simplifying assumptions. 


e All a; are equal to 1//7'+ 1, as they would be in the history state. This assumption is 
reasonable because |¢;) and U;|¢4-1) both have norm 1, so differences between a; and ay_ 
will only make the penalty of Eq. (14.3) bigger. 
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e |¢o) starts with s Os, so |o) = |0*)|w) for some w(n)-qubit state |Y}. This is reasonable 
because if |¢9) deviates significantly from this form, then Hjyit will give a large energy penalty. 


e |¢r) has acceptance probability close to 1. This is reasonable because if |¢7) has low accep- 
tance probability, then Hfnal will give a large energy penalty. 


Because of the second item and the fact that x € Lo, the state C,|¢9) must have acceptance 
probability close to 0. Comparing with the third item, it follows that |¢r) and C;,|¢9) must be 
nearly orthogonal, so their distance is close to v2, and in particular at least 1. This implies 


T 
1 < |ll¢r) — Cnl¢o)|| = 2 -Ursilge) — Ur +++ Uile) 
E t= 
< XU ||Ur + Ulo) — Ur- Uribe) | -5 le — U;|b+-1) |. 
t=1 t=1 


Here the first equality uses a telescoping sum, the second inequality is the triangle inequality, and 
the last equality is because the operator norm is unitarily invariant (||Uv — Uw] = ||v — wl). 
Using Eq. (14.3) with ay = a1 = 1/VT + 1, and Cauchy-Schwarz, we now have: 
T 


1 1 1 = 2 
(Y |H") > > YEY) AT +1) >, llot — Uel@e—-1) |h 


2 
1 
TED ae cle) 2 TTD = 


Accordingly, if  € Lo then Amin = minn (Y'|H |Y’) > b. If x € Lı then the history state shows 
Amin < a = 1/(4T(L + 1)). We also have b — a = 1/(4T (T + 1)) > 1/poly(n), as required. 


3Exercise 5 shows this is without loss of generality, though the lower bound on b — a becomes a worse polynomial 
(1/T° instead of 1/T?) if we drop these three assumptions and take the unary clock of the next section into account. 
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14.3.2 Reducing the locality 


The above construction of H, with the history state as witness for x € Lj, is due to Kitaev [156], 
who was inspired by an earlier clock construction in [112]. Our proof of soundness is a bit different 
from Kitaev’s. He also showed that the locality can be reduced from O(log n) to 5 by representing 
the clock in unary: t = 0 would now be represented by |07)¢, t = 1 by |107~!)g, t = 2 by |1107~?)a, 
etc. This now requires T qubits to represent the clock instead of [log(T + 1)]. Denoting the t-th 
qubit of the clock by ‘Cr’, for t € [T], the previous terms in H now become 


s 
Hint = 3 ID: @ |0) 0c 
i=l, 
H; = $ (I8 |100) (100| c, 1,Ct, Cip + I 8 |110) (110|, 1,Ct,C41 + 
— Ut 8 |110) (100|cy_1,01,Cr41 = UF ® |100)(110|c4_1,0,,C+41) 
Hfnal = 10) (O|1 ® 11) (lle, 


(Hı and Hr have a slightly different form than the H; written above, because the clock register 
starts resp. ends there.) We also add the following to penalize a T-bit clock register that doesn’t 
conform to the proper 1s-followed-by-Os format of a unary number: 


T-1 
Agiock = S © 101) (Oley. 
t=1 


Note that the terms in the Hamiltonian now only “touch” at most 5 qubits: in particular, each H; 
touches 1 or 2 qubits for the gate U;, and 3 qubits of the clock. This shows k-local Hamiltonian 
is QMA-complete for k > 5. Subsequently, Kempe, Kitaev, and Regev [149] showed that k-local 
Hamiltonian is QMA-complete already for k = 2 (in contrast to 2-SAT, which is in P). 

All of these results assume a polynomially small gap 6 — a. Intuitively, the local Hamiltonian 
problem becomes easier if the gap between a and b becomes bigger, since the 1-instances and 
0-instances are further apart and should be easier to distinguish. One may ask whether k-local 
Hamiltonian is still QMA-complete if b — a > Q(m) instead of > 1/poly(n). We know that it is 
at least NP-hard for all k > 2 because of the connection with MAX-2-SAT mentioned at the end 
of Section 14.2, but whether it is also QMA-hard is a longstanding open problem in the field of 
Hamiltonian complexity [119], known as the “quantum PCP conjecture” [6]. 


14.4 Other interesting problems in QMA 


The main reason NP-completeness is a prominent notion in computer science, is that very many in- 
teresting and practically important computational problems are NP-complete: satisfiability, Trav- 
eling Salesman, scheduling problems, integer linear programming, protein folding, and many more. 
See [117] for an already very extensive list from the late-70s. Similarly (though not yet as exten- 
sively) there is a growing list of QMA-complete problems, often from physics or chemistry. Below 
we list a few without proof of their QMA-hardness; the fact that these problems are in QMA is 
usually easy to show. See [8, 61] for more. 


e The local Hamiltonian problem for Hamiltonians corresponding to actual physical 
systems, such as 2-local Ising model with 1-local transverse field and a tunable 2-local 
transverse coupling [58], the 2D Heisenberg Hamiltonian with local magnetic fields [220], the 
2D Hubbard Hamiltonian with local magnetic fields [220], and the Bose-Hubbard model [84]. 
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e Non-Identity check [144]. Given a polynomial-size quantum circuit U on n qubits, deter- 
mine whether U is not close to the identity up to some global phase: 
(1) for all ø € [0, 2) we have ||U — e’?I|| > b or 
(0) there is a ¢ € [0, 2r) such that ||U — e’?I|| < a, 
promised one of these is the case, and b — a > 1/poly(n). 


e k-local density matrix consistency [170, 171, 68].4 Given m = poly(n) density matrices 
P1,-+-;Pm Where the state of p; is only on qubits C; C [n] with |C;| < k, determine whether: 
(1) there is an n-qubit density matrix p such that for all ¢ € [m], Tric; (P) = pi, or 
(0) for all n-qubit density matrices p there is an i € [m] such that Trnic; (p) — pil| ea 
promised one of these is the case, and b > 1/poly(n). 


The following problem is in QMA, is not known to be QMA-complete, but also not known to be 
in NP (in contrast, group membership is known to be in NP). 


e Group non-membership [243]. Given a finite group G, subgroup H < G, and an element 
g € G, determine whether: 
(1) gg H, or 
(0)gE H. 


Here the groups H, G could be given as multiplication tables for a set of generating elements, or by 
means of an oracle (black-box) for the multiplication. The witness state for case (1) is the uniform 
superposition over H. Exercise 9 asks you to show completeness; proving soundness is a bit harder. 


14.5 Quantum interactive proofs 


The prover-verifier protocols in QMA, just like NP and MA, only allow one message from the 
prover to the verifier. This is akin to submitting a proof to a journal, where the referee then 
verifies the correctness of the proof without further interaction with the prover. One can also allow 
multiple rounds of interaction, formalizing the back-and-forth situation that often occurs when a 
mathematician (the prover) proves a complicated theorem in front of a colleague (the verifier): the 
verifier can raise objections or ask questions about steps of the proof, which the prover answers 
(hopefully to the verifier’s satisfaction) before proceeding with the next steps. A “proof” here is a 
very general notion: it’s any polynomial interaction that convinces the verifier of true statements 
and cannot convince the verifier about any false statements. 

The complexity class IP consists of those languages that can be decided by a polynomial 
interaction between an unbounded prover and a polynomial-time classical verifier. Again, if z € Lı 
then the prover should succeed in convincing the verifier to accept (with probability > 2/3), and if 
x € Lo then no matter what the prover does, the verifier should reject with probability > 2/3. A 
fundamental classical complexity theory result says that IP=PSPACE [181, 224, 225]. 

One can define quantum IP (QIP) analogously. The two main results known about QIP are: 


1. Every QIP protocol can be implemented with only 3 messages, with the prover starting [245, 
154]. Roughly speaking, the 3-message protocol starts from a poly(n)-message protocol for 
an L € QIP, using the 3 messages to check one randomly chosen one among the poly(n) 
messages. This results in a 3-message protocol with 1/poly(n) gap between completeness and 


4A density matriz is a generalization of a pure state. See Chapters 15 and 18 for the notation used here. 
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soundness parameters for the same L. This small gap can then be amplified to a constant 
gap by repeating the protocol in parallel; this repetition increases the communication per 
message but not the number of messages, which remains 3. In contrast, it is widely believed 
that IP restricted to 3-message protocols is a much smaller class than the full class IP. 


2. QIP=IP [143]. Roughly speaking, this is proved by showing that the acceptance probability 
of the optimal strategy of the prover in a QIP-protocol can be described implicitly by an 
exponential-size semidefinite program. Its optimal value (which is either > 2/3 or < 1/3) can 
then be approximated by an exponential-size but polynomial-depth circuit. Such circuits can 
be implemented in PSPACE, and we already knew that IP=PSPACE. 


Accordingly, adding quantum to the model of interactive proofs does not change the class of lan- 
guages that can be decided, but does reduce the required number of messages between prover and 
verifier from polynomial to constant. See [242] for much more about quantum proof systems. 


Exercises 
1. Give a satisfying assignment for the 3-SAT instance of Eq. (14.1). 


2. Show that 1-local Hamiltonian is in P. 


3. (H) Suppose you had an efficient quantum algorithm for the k-local Hamiltonian problem for 
every a,b that satisfy b — a > 1/n. Give an efficient quantum algorithm that approximates 
Amin to within additive error +2/n. 


4. Show that k-local Hamiltonian is in QMA in two different ways, by providing details for the 
following two sketches: 


(a) Choose a j € [m] uniformly at random and measure the observable H; on the witness 
state. Repeat this a few times (using new witness states each time) to approximate the 
expected value. 


(b) Apply phase estimation (Section 4.6) to the unitary U = eH with the given witness 
state; U can be implemented using Hamiltonian simulation (Chapter 9). 


5. This long exercise completes the proof of the soundness for the Hamiltonian of Section 14.3.2, 
with the unary clock and without the three simplifying assumptions of Section 14.3.1. The 
gap b — a between completeness and soundness will now be Q(1/T®) instead of 0(1/T?). 


(a) Assume the error probability of the QMA-protocol for L is < 1/T°. Let |y”) be an 
arbitrary ground state (with energy Amin) for the Hamiltonian of Section 14.3.2. Show 
that if x € Ly, then Amin < 1/T*. 


(b) (H) For the remainder of the exercise assume x € Lo. Let Phe be the projector on the 
subspace of bad (i.e., non-unary) clock states. Show that || Pel”) ||? < Amin- 


(c) Show that ||H|| < O(T). 


(d) (H) Let |W”) be the state obtained from |Y”) by removing Pye|Y”) and renormalizing. 
Show that its energy àA” cannot be much larger than Amin: A” < O(T Amin). 
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(e) (H) Write |y”) = D az|z)|t) for nonnegative reals a; and normalized states |+). 
Show that Da lar — agl? < 2d”. 


(£) (H) Show that for all t,t’ we have jay — a|? < 2A"T. 


(g) (H) Let |Y’) = TH ys |\¢z)|t) be the state after making all amplitudes equal in |y”). 
Show that |I") — |w)|2 < O(T24’). 

(h) Modify |o) and |@r) to satisfy the second and third simplifying assumptions. 

(i) Show that the energy A of |’) cannot be much larger than A”: X < O(T?X"). 

(j) (H) Show that Amin > 2(1/T°). 


6. Consider a promise problem L = (L1, Lo, Lx) E€ QMA and a protocol as in Definition 1 with 
witness states of w(n) qubits. 


(a) (H) Show that there is a QMA protocol for L with witness states of w(n) qubits and 
error probability < zoe), 


(b) Suppose we now replace the w(n) qubits of the protocol of (a) with a uniformly random 
w(n)-bit basis state. Show that if x € Lı, then the acceptance probability (i.e., the 
probability of output 1) is > 29-u(n), while if x € Lo then it is < go), 


(c) Use (b) to show that QMA with witness states restricted to w(n) = O(logn) qubits 
equals BQP. 


7. (H) Let PP be the class of promise problems L = (Lj, Lo, Lx) that can be decided by a 
polynomial-time classical algorithm with success probability > 1/2 (meaning that for in- 
puts x € Lı the algorithm accepts with probability > 1/2, and for x € Lo it accepts with 
probability < 1/2). Show that QMA C PP. 


8. Consider the following computational decision problem. We are given a Hamiltonian H of 
the form of page 117, with the additional property that w(n) = 0. We are promised that the 
smallest eigenvalue Amin of H is either < 1/(4T (T + 1)) (“yes-instance”) or > 1/(2T(T + 1)) 
(“no-instance”), and the problem is to decide which case we are in. 


(a) (H) Show that this problem is in BQP. 


(b) Show that this problem is BQP-hard, meaning that for every promise problem L = 
(Lı, Lo, Lx) in BQP there exists a classical deterministic polynomial-time algorithm 
that maps every x € Lı to a yes-instance of the above problem and every x € Lp toa 
no-instance. 


9. Suppose we are given (in some form) a finite group G, a subgroup H < G, and an element 
g € G, and we can efficiently implement the unitary map V corresponding to multiplication 
with g (i.e., the map V : |h) > |h o g}). Let 


1 
s= y ji 
I) TH] |h) 


heH 


be the uniform superposition over H. The prover for the non-membership problem can 
construct this state, though not necessarily efficiently. 
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(a) Show that if g € H, then V|w) = |). 
(b) Show that if g ¢ H, then V|w) is orthogonal to |). 
(c) Consider the following procedure that the verifier can use to test if g € H or not: 
(1) prepare an auxiliary qubit in state H|0), 
(2) conditioned on that qubit apply V to |w), 
(3) apply H to the auxiliary qubit and measure it. 
Show that the probability of measurement outcome 0 is 1 if g € H, and is 1/2 if g ¢ H. 
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Chapter 15 


Quantum Encodings, with a 
Non-Quantum Application 


15.1 Mixed states and general measurements 


So far, we have restricted our states to so-called pure states: unit vectors of amplitudes. In the 
classical world we often have uncertainty about the state of a system, which can be expressed by 
viewing the state as a random variable that has a certain probability distribution over the set 
of basis states. Similarly we can define a mized quantum state as a probability distribution (or 
“mixture” ) over pure states. While pure states are written as vectors, it is most convenient to write 
mixed states as density matrices. A pure state |¢) corresponds to the density matrix |) (¢|, which 
is the outer product of the vector |ġ) with itself. For example, the pure state |¢) = a|0) + 6/1) 
corresponds to the density matrix 


aas pe a= (le me ). 


A mixed state that is in pure states |¢,),...,|¢¢) with probabilities pı, ..., pe, respectively, corre- 
sponds to the density matrix p = ys pildi) (dil. This p is sometimes called a “mixture” of the 
states |b1),...,|@¢).! The set of density matrices is exactly the set of positive semidefinite (psd) 
matrices of trace 1. A mixed state is pure if, and only if, it has rank 1. 

You can always write a mixed state p as a probability distribution over orthogonal pure states, 
using the diagonalization of p (see Appendix A.5) plus the observations that (1) the eigenvalues of 
a trace-1 psd matrix form a probability distribution, and (2) that the eigenvectors of a Hermitian 
matrix can be assumed to form an orthonormal set without loss of generality. But you can also 
write p as a convex combination of non-orthogonal states (see Exercise 1.c). 

Applying a unitary U to a pure state |¢) gives pure state U|). Written in terms of rank-1 
density matrices, this corresponds to the map 


|e) (al > Ule gU". 


‘Note that applying the probabilities p; to the vectors |¢:) (rather than to the matrices |¢;)(¢i|) does not make 
sense in general, because = pili) need not be a unit vector. Using square roots of the probabilities also doesn’t 


work, because }~\_, \/pi|@i) need not be a unit vector either if the |¢:) are not pairwise orthogonal. 
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By linearity, this actually tells us that a unitary acts on an arbitrary mixed state by conjugation: 
p> UpU". 


What about measurements? Recall from Section 1.2.2 that an m-outcome projective measurement 
corresponds to m orthogonal projectors P,,...,Pm that satisfy )>", P; = I. When applying this 
measurement to a mixed state p, the probability to see outcome i is given by p; = Tr(Pip). If we 
get outcome i, then the state collapses to P;pP;/p; (the division by p; renormalizes the state to have 
trace 1). This may look weird, but let’s recover our familiar measurement in the computational 
basis in this framework. Suppose we measure a state |¢) = ae aj;|j) using d projectors P; = |i) (i| 
(note that $`; P; is the identity on the d-dimensional space). The probability to get outcome 7 is 
given by pi = Tr(P,|¢)(@|) = |(é|o)|? = |a;|?. If we get outcome i then the state collapses to 
P|) (0|P;/pi = aili) (iļaž /p; = |i) (i|. This is exactly the measurement in the computational basis 
as we have used it until now. Similarly, a measurement of the first register of a two-register state 
corresponds to projectors P; = |i) (i| @ I, where i goes over all basis states of the first register. 

If we only care about the final probability distribution on the m outcomes, not about the 
resulting state, then the most general thing we can do is a POVM. This is specified by m positive 
semidefinite matrices E1, . . . , Em satisfying )>;", E; = I. When measuring a state p, the probability 
of outcome i is given by Tr(E;p). 


15.2 Quantum encodings and their limits 


Quantum information theory studies the quantum generalizations of familiar notions from classical 
information theory such as Shannon entropy, mutual information, channel capacities, etc. Here we 
will discuss a few quantum information-theoretic results that all have the same flavor: they say 
that a low-dimensional quantum state (i.e., a small number of qubits) cannot contain too much 
accessible information. 


Holevo’s Theorem: The mother of all such results is Holevo’s theorem from 1973 [138], which 
predates the area of quantum computing by several decades. Its proper technical statement is 
in terms of a quantum generalization of mutual information, but the following consequence of it 
(derived by Cleve et al. [90]) about two communicating parties, suffices for our purposes. 


Theorem 3 (Holevo, CDNT) Suppose Alice wants to communicate some classical string x to 
Bob. 


e If Alice sends Bob m qubits, and they did not share any prior entanglement, then Bob receives 
at most m bits of information about «x. 


e If Alice sends Bob m qubits, and they did share some prior entangled state, then Bob receives 
at most 2m bits of information about x. 


e If Alice sends Bob m classical bits, and they did share some prior entangled state, then Bob 
receives at most m bits of information about x. 


This theorem is slightly imprecisely stated here, but the intuition should be clear: if Bob makes 
any measurement on his state after the communication, then the mutual information between his 
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classical outcome and Alice’s x, is bounded by m or 2m. In particular, the first part of the theorem 
says that if we encode some classical random variable X in an m-qubit state’, then no measurement 
on the quantum state can give more than m bits of information about X. If we encoded the classical 
information in an m-bit system instead of an m-qubit system this would be a trivial statement, 
but the proof of Holevo’s theorem is quite non-trivial. Thus we see that an m-qubit state, despite 
somehow “containing” 2™ complex amplitudes, is no better than m classical bits for the purpose of 
storing or transmitting information. Prior entanglement can improve this by a factor of 2 because 
of superdense coding (see Exercise 1.12), but no more than that. 


Low-dimensional encodings: Here we provide a “poor man’s version” of Holevo’s theorem due 
to Nayak {193, Theorem 2.4.2], which has a simple proof and often suffices for applications. Suppose 
we have a classical random variable X, uniformly distributed over [N] = {1,...,N}.° Let £ > pr 
be some encoding of |N], where py is a mixed state in a d-dimensional space. Let E1,...,EN be 
the POVM operators applied for decoding; these sum to the d-dimensional identity operator. Then 
the probability of correct decoding in case X = zx, is 


Pr = Te Laps) < Tr(Ez). 


The sum of these success probabilities is at most 


N N N 
So px < $ T(E) = Tr (> z) = Tr(I) =d. (15.1) 
g= 4=1 =1 


In other words, if we are encoding one of N classical values in a d-dimensional quantum state, then 
any measurement to decode the encoded classical value has average success probability at most d/N 
(uniformly averaged over all N values that we can encode). For example, if we encode n uniformly 
random bits into m qubits, we will have N = 2”, d = 2™, and the average success probability of 
decoding is at most 2™/2”, which is very small unless m is nearly n. 


Random access codes: The previous two results dealt with the situation where we encoded a 
classical random variable X in some quantum system, and would like to recover the original value 
X by an appropriate measurement on that quantum system. However, suppose X = X1... Xn is 
a string of n bits, uniformly distributed and encoded by a map x +> pz, and it suffices for us if 
we are able to decode individual bits X; from this with some probability p > 1/2. More precisely, 
for each i € [n] there should exist a measurement {M;, I — M;} allowing us to recover 2;. Mi 
would correspond to output 1 and I — M; to output 0. Hence for each x € {0,1}” we should have 
Tr(Mipx) > p if x; = 1 and Tr(Mip,) < 1 — p if z; = 0. An encoding satisfying this is called a 
quantum random access code, since it allows us to choose which bit of X we would like to access. 
Note that the measurement to recover x; can change the state pz, so generally we may not be able 
to decode more than one bit of x (also, we cannot copy py because of the no-cloning theorem, see 
Exercise 1.10). 

An encoding that allows us to recover (with high success probability) an n-bit string requires 
about n qubits by Holevo. Random access codes only allow us to recover each of the n bits. Can 


?Via an encoding map £ > pz; we generally use upper-case letters like X to denote random variables, lower-case 
letters like x to denote specific values. 
3NB: unlike in most of these lecture notes, N need not equal 2” in this chapter! 


127 


they be much shorter? In small cases they can be: for instance, one can encode two classical bits 
into one qubit, in such a way that each of the two bits can be recovered with success probability 
85% from that qubit (see Exercise 2). However, Nayak [193] proved that asymptotically quantum 
random access codes cannot be much shorter than classical. 


Theorem 4 (Nayak) Let £ +> pz be a quantum random access encoding of n-bit strings into 
m-qubit states such that, for each i € |n], we can decode X; from |¢x) with success probability p 
(averaged over a uniform choice of x and the measurement randomness). Then m > (1 — H(p))n, 
where H(p) = —plogp — (1 — p) log(1 — p) is the binary entropy function. 


The intuition of the proof is quite simple: since the quantum state allows us to predict the bit 
X; with probability p;, it reduces the “uncertainty” about X; from 1 bit to H(p;) bits. Hence it 
contains at least 1— H (p;) bits of information about X;. Since all n X;’s are independent, the state 
has to contain at least 5>;"_,(1 — H(p;)) bits of information about X in total. 


15.3 Lower bounds on locally decodable codes 


Here we will give an application of quantum information theory to a classical problem.* 

The development of error-correcting codes is one of the success stories of science in the second 
half of the 20th century. Such codes are eminently practical, and are widely used to protect 
information stored on discs, communication over channels, etc. From a theoretical perspective, 
there exist codes that are nearly optimal in a number of different respects simultaneously: they 
have constant rate, can protect against a constant noise-rate, and have linear-time encoding and 
decoding procedures. We refer to Trevisan’s survey [236] for a complexity-oriented discussion of 
codes and their applications. 

One drawback of ordinary error-correcting codes is that we cannot efficiently decode small 
parts of the encoded information. If we want to learn, say, the first bit of the encoded message 
then we usually still need to decode the whole encoded string. This is relevant in situations where 
we have encoded a very large string (say, a library of books, or a large database), but are only 
interested in recovering small pieces of it at any given time. Dividing the data into small blocks 
and encoding each block separately will not work: small chunks will be efficiently decodable but 
not error-correcting, since a tiny fraction of well-placed noise could wipe out the encoding of one 
chunk completely. There exist, however, error-correcting codes that are locally decodable, in the 
sense that we can efficiently recover individual bits of the encoded string. 


Definition 3 C : {0,1}" > {0,1}% is a (¢q,6,¢)-locally decodable code (LDC) if there is a classical 
randomized decoding algorithm A such that 


1. A makes at most q queries to an N-bit string y. 


2. For all x € {0,1}" andi € [n], and all y € {0,1}% with Hamming distance d(C(x), y) < 5N 
we have Pr[| A} (i) = a] > 1/2 +€. 


Here 6 is an upper bound on the fraction of bits of the codeword that may have been corrupted 
(by some noise process, or by our worst enemy), and £ is a lower bound on the advantage we have 


“There is a growing number of such applications of quantum tools to non-quantum problems. See [100] for a 
survey. 
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compared to just randomly guessing the value of the bit z;. The notation A¥(z) reflects that the 
decoder A has two different types of input. On the one hand there is the (possibly corrupted) 
codeword y, to which the decoder has oracle access and from which it can read at most q bits of 
its choice. On the other hand there is the index i of the bit that needs to be recovered, and which 
is known fully to the decoder. 

The main question about LDCs is the tradeoff between the codelength N and the number of 
queries q (which is a proxy for the decoding-time). This tradeoff is still not very well understood. 
The only case where we know the answer is the case of q = 2 queries.” For q = 2 there is the 
Hadamard code: given x € {0,1}", define a codeword of length N = 2” by writing down the bits 
x-z mod 2, for all z € {0,1}”", with the z’s ordered in lexicographic order. For example for n = 2 
and x = 10, the codeword would be 


C(x) = (a-00,2-01,2-10,x2-11) = 0011. 


One can decode x; with 2 queries as follows: choose z € {0,1}" uniformly at random, query the 
(possibly corrupted) codeword at indices z and z@ e; (where the latter denotes the string obtained 
from z by flipping its i-th bit), and output the sum of the two returned bits modulo 2. Individually, 
each of these two indices z and z @® e; is uniformly distributed. Hence for each of them, the 
probability that the returned bit is corrupted is at most 6. By the union bound, with probability 
at least 1 — 26, both queries return the uncorrupted values. Adding these two bits mod 2 gives the 
correct answer: 


C(x) P C(£)zpe; = (2 - 2) P (£ - (z @ e:)) = T : ei = Ti. 


Thus the Hadamard code is a (2,6, 1/2 — 28)-LDC of exponential length. 

The only superpolynomial lower bound known on the length of LDCs is for the case of 2 queries: 
there one needs an exponential codelength and hence the Hadamard code is essentially optimal. 
This is shown via a quantum argument [152]—despite the fact that the result is a purely classical 
result, about classical codes and classical decoders. The easiest way to present this argument is to 
assume the following fact, which states a kind of “normal form” for the decoder. 


Fact 1 (Katz & Trevisan [148] + folklore) For every (q,6,¢)-LDC C : {0,1}" > {0,1}, 
and for each i € |n], there exists a set M; of Q(deN/q") disjoint tuples, each of at most q indices 
from [N], and a bit ait for each tuple t € M;, such that the following holds: 


Pr Ti = lit D C(x);| > 1/2 + Q(e/27), 15.2 
akip [R= HSL Clo] 2124/2 (15.2) 


where the probability is taken uniformly over x. Hence to decode x; from C(x), the decoder can just 
query the indices in a randomly chosen tuple t from Mi, outputting the sum of those q bits and 
Qit- 

Note that the above decoder for the Hadamard code is already of this form, with M; consisting of 
the 2”~! pairs {z,z@e;}. We omit the fairly easy proof of Fact 1, which uses purely classical ideas. 


°For q = 1, LDCs don’t exist once n is sufficiently large [148]. For q = 3, the best upper bound known on the 


codelength N is roughly 2? = [252, 104], while the best lower bound is roughly n? [10]. 
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Now suppose C : {0,1}” — {0,1}% is a (2,6,¢)-LDC. We want to show that the codelength 
N must be exponentially large in n. Our strategy is to show that the following N-dimensional 
quantum encoding is a quantum random access code for x (with some success probability p > 1/2): 


Theorem 4 then implies that the number of qubits of this state (which is [log N]) is at least 
(1 — H(p))n = Q(n), and we are done. 

Suppose we want to recover x; from |z}. We’ll do this by a sequence of two measurements, as 
follows. We turn each M; from Fact 1 into a projective measurement: for each pair (j,k) € Mi 
form the projector Pj, = |j)(j| + |k) (k|, and let Prest = Di eUreaagt |J) (j| be the projector on the 
remaining indices. These |M;| + 1 projectors sum to the N-dimensional identity matrix, so they 
form a valid projective measurement. Applying this to |r) gives outcome (j,k) with probability 
lPi)? = 2/N for each (j,k) € Mi. There are |M;| = Q(deN) different (j, k)-pairs in M;, so 
the probability to see one of those as outcome of the measurement, is |M;| - 2/N = Q (ôe). With 
the remaining probability r = 1 — Q(de), we'll get “rest” as outcome of the measurement. In the 
latter case we didn’t get anything useful from the measurement, so we’ll just output a fair coin flip 
as our guess for x; (then the output will equal x; with probability exactly 1/2). In case we got one 
of the (j, k) as measurement outcome, the state has collapsed to the following useful superposition: 


—1)C(2); 
Se (ots) + (p) = AE (i) + eeen) 


We know what j and k are, because it is the outcome of the measurement on |z). Now do a 
2-outcome projective measurement with projectors Po and P; corresponding to the two vectors 
alls) + |k)) and (ls) —|k)), respectively. The measurement outcome equals the value C(x); © 
C(x), with probability 1. By Eq. (15.2), if we add the bit a; (;,) to this, we get x; with probability 
at least 1/2 + Q(e). The success probability of recovering x;, averaged over all x, is 


p> srt G +(e) iap= 5 + 1(6e?). 


Thus we have constructed a random access code that encodes n bits into log N qubits, and has 
success probability at least p. Applying Theorem 4 and using that 


1 — H(1/2 +n) = O(n’) for n € [0,1/2], (15.3) 
we obtain the following: 


Theorem 5 If C’: {0,1}" > {0,1} is a (2,6,¢)-locally decodable code, then N > O*M, 


Exercises 


1. Suppose we have a qubit whose density matrix is p. 


(a) Show that there exist real numbers ro, r1, r2,r3 such that p= RI + 5X + BY + BZ, 
where I, X,Y, Z are the Pauli matrices (see Appendix A.9). 
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b 
c 
d 


e) Show that r? r2 r 


rm 


Show that ro = 1. 
Show that Tr(p?) = 3 


( 
Show that r? + r2 + r2 
2 
3 


~*~ ros 


) 
) 
) 
) 


iff p is a pure state. 


~ 


Comment: One can represent a qubit by the corresponding vector (r1, r2,r3) € R?. By part (e) the pure states 
are exactly the ones that are on the boundary of the 3-dimensional sphere of radius 1. The mixed states are 
in the interior of the sphere, and the maximally mixed state p = I/2 is at the origin (0,0,0). This geometric 
picture is called the Bloch-sphere representation of a qubit, and is very useful in physics. For example, single- 
qubit gates correspond to rotations on this sphere. Unfortunately this picture does not generalize cleanly to 


more than one qubit. 


2. (a) (H) Give a quantum random access code that encodes 2 classical bits into 1 qubit, such 
that each of the two classical bits can be recovered from the quantum encoding with 
success probability p > 0.85. 

(b) Prove an upper bound of 1/2+O(1/./n) on the success probability p for a random access 
code that encodes n classical bits into 1 qubit. 


3. (H) Teleportation transfers an arbitrary unknown qubit from Alice to Bob, using 1 EPR-pair 
and 2 classical bits of communication from Alice to Bob (see Section 1.5). Prove that these 
2 bits of communication are necessary, i.e., you cannot teleport an arbitrary unknown qubit 
using 1 EPR-pair and only 1 classical bit of communication. 


4. Suppose n + 1 = 2* for some integer k. For £ € {0,...,n} define n-qubit state 


m= E is, 


\/ C) xe {0,1}”:|z|=£ 


where |x| denotes the Hamming weight (number of 1s) in z. 


(a) Show that (pele) equals 1 if £ = 7, and equals 0 otherwise. 


(b) Consider a qubit |¢) = a|0) + B\1). Show that the n-qubit state |¢)®" can be written as 
a linear combination of the states |V). Say explicitly what the coefficients of this linear 
combination are. 


(c) Give a unitary V, independent of a, 8, that encodes |¢)®” into a k-qubit state |q) in 
the sense that 
V : |6)2" > |) 8 |00). 


Say explicitly what your state |y) is and how it depends on a and 6 (yow’re not required 
to write out circuits). 


5. Consider the Hadamard code C that encodes n = 2 bits 11x29 into a codeword of N = 4 bits. 


(a) Give the 4-bit codeword C(11). 


(b) What are the states |¢,) that arise as quantum random access code when we apply the 
LDC lower bound proof of Section 15.3 to C? Give the 4 states, not one general formula. 
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(c) What is the measurement used for recovering x2 from |z) at the end of that proof? 
You may either describe this as a sequence of two projective measurements, or as one 
(combined) projective measurement. 


6. (a) Let x € {0,1}". Suppose we apply the 2n-qubit Fourier transform Fy2n on the 2n-bit 
basis state |a)|0"), followed by F>,! on the last n qubits (and identity on the first n 
qubits). Show that we end up with the 2n-qubit state |+)°"|z). 


(b) (H) Consider a circuit C that implements Fy2n in some way using arbitrary 1-qubit and 
2-qubit gates (C can do anything, it need not be one of the specific QFT circuits from 
the lecture notes). Show that there must be Q(n) two-qubit gates in C where the control 
bit lies in the first n qubits of the state and the target qubit lies in the second n qubits 
(or vice versa). 


7. Suppose there are two classically-known mixed states po and p1, and we are given one copy of 
quantum state pp for a uniformly random b € {0,1}. We want to learn b using some 2-outcome 
projective measurement with operators Py and P}, which we can choose ourselves depending 


on what pọ and pı are. The success probability of such a measurement is 4(Tr(Popo) + 


2 
Tr(P1p1)). 


(a) (H) Give a projective measurement with success probability > 5 + 4||0 — pi||,, where 
the norm ||A||, of a matrix A is defined as the sum of A’s singular values. 


(b) Show that every 2-outcome projective measurement Po, P; has a success probability that 
is < 5 + Gllpo — pills 
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Chapter 16 


Quantum Communication Complexity 


Communication complexity was first introduced by Yao [250], and has been studied extensively in 
the area of theoretical computer science and has deep connections with seemingly unrelated areas, 
such as VLSI design, circuit lower bounds, lower bounds on branching programs, sizes of data 
structures, and bounds on the length of logical proof systems, to name just a few. 


16.1 Classical communication complexity 


First we sketch the setting for classical communication complexity. Alice and Bob want to compute 
some function f : D — {0,1}, where D C X x Y.! Alice receives input x € X, Bob receives input 
y € Y, with (x,y) € D. A typical situation, illustrated in Fig. 16.1, is where X = Y = {0,1}”, 
so both Alice and Bob receive an n-bit input string. As the value f(x,y) will generally depend on 
both x and y, some communication between Alice and Bob is required in order for them to be able 
to compute f(x,y). We are interested in the minimal amount of communication they need. 


Inputs: x € {0,1} y € {0,1}” 


communication 


Output: f(x,y) 
Figure 16.1: Alice and Bob solving a communication complexity problem 


A communication protocol is a distributed algorithm where first Alice does some individual 
computation, and then sends a message (of one or more bits) to Bob, then Bob does some compu- 
tation and sends a message to Alice, etc. Each message is called a round. After one or more rounds 
the protocol terminates and one of the parties (let’s say Bob) outputs some value that should be 
f(x,y). The cost of a protocol is the total number of bits communicated on the worst-case input. 


If the domain D equals X x Y then f is called a total function, otherwise it is called a partial or promise function. 
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A deterministic protocol for f always has to output the right value f(x,y) for all (x,y) € D. Ina 
bounded-error protocol, Alice and Bob may flip coins and the protocol has to output the right value 
f(x,y) with probability > 2/3 for all (x,y) € D. We could either allow Alice and Bob to toss coins 
individually (local randomness, or “private coin”) or jointly (shared randomness, or “public coin”). 
A public coin can simulate a private coin and is potentially more powerful. However, Newman’s 
theorem [194] says that having a public coin can save at most O(logn) bits of communication, 
compared to a protocol with a private coin. 

To illustrate the power of randomness, let us give a simple yet efficient bounded-error protocol 
for the equality problem, where the goal for Alice is to determine whether her n-bit input is the 
same as Bob’s or not: f(x,y) = 1 if x = y, and f(x,y) = 0 otherwise. Alice and Bob jointly 
toss a random string r € {0,1}". Alice sends the bit a = x -r to Bob (where ‘-’ is inner product 
mod 2). Bob computes b = y - r and compares this with a. If x = y then a = b, but if x Æ y then 
a # b with probability 1/2. Repeating this a few times, Alice and Bob can decide equality with 
small error probability using O(n) public coin flips and a constant amount of communication. This 
protocol uses public coins, but note that Newman’s theorem implies that there exists an O(log n)- 
bit protocol that uses a private coin (see Exercise 9 for an explicit protocol). Note that the correct 
output of the equality function depends on all n bits of x, but Bob does not need to learn all n bits 
of x in order to be able to decide equality with high success probability. In contrast, one can show 
that deterministic protocols for the equality problem need n bits of communication, so then Alice 
might as well just send x to Bob. 


16.2 The quantum question 


Now what happens if we give Alice and Bob a quantum computer and allow them to send each 
other qubits and/or to make use of EPR-pairs that they share at the start of the protocol? 

Formally speaking, we can model a quantum protocol as follows. The total state consists 
of 3 parts: Alice’s private space, the channel, and Bob’s private space. The starting state is 
\x)|0)|y): Alice gets x, the channel is initialized to 0, and Bob gets y. Now Alice applies a unitary 
transformation to her space and the channel. This corresponds to her private computation as well 
as to putting a message on the channel (the length of this message is the number of channel-qubits 
affected by Alice’s operation). Then Bob applies a unitary transformation to his space and the 
channel, etc. At the end of the protocol Alice or Bob makes a measurement to determine the 
output of the protocol. This model was introduced by Yao [251]. 

In the second model, introduced by Cleve and Buhrman [89], Alice and Bob share an unlimited 
number of EPR-pairs at the start of the protocol, but now they communicate via a classical channel: 
the channel has to be in a classical state throughout the protocol. We only count the communication, 
not the number of EPR-pairs used. Protocols of this kind can simulate protocols of the first kind 
with only a factor 2 overhead: using teleportation, the parties can send each other a qubit using 
an EPR-pair and two classical bits of communication. Hence the qubit-protocols that we describe 
below also immediately yield protocols that work with entanglement and a classical channel. Note 
that an EPR-pair can simulate a public coin toss: if Alice and Bob each measure their half of the 
pair of qubits, they get the same random bit. 

The third variant combines the strengths of the other two: here Alice and Bob start out with 
an unlimited number of EPR-pairs and they are allowed to communicate qubits. This third kind 
of communication complexity is in fact equivalent to the second, up to a factor of 2, again by 


134 


teleportation. 

Before continuing to study this model, we first have to face an important question: is there 
anything to be gained here? At first sight, the following argument seems to rule out any significant 
gain. Suppose that in the classical world k bits have to be communicated in order to compute f. 
Since Holevo’s theorem says that k qubits cannot contain more information than k classical bits, it 
seems that the quantum communication complexity should be roughly k qubits as well (maybe k/2 
to account for superdense coding, but not less). Surprisingly (and fortunately for us), this argument 
is false, and quantum communication can sometimes be much less than classical communication 
complexity. The information-theoretic argument via Holevo’s theorem fails, because Alice and 
Bob do not need to communicate the information in the k bits of the classical protocol; they are 
only interested in the value f(x,y), which is just 1 bit. Below we will go over four of the main 
examples that have so far been found of differences between quantum and classical communication 
complexity. 


16.3 Example 1: Distributed Deutsch-Jozsa 


The first impressively large gaps between quantum and classical communication complexity were 
exhibited by Buhrman, Cleve, and Wigderson [74]. Their protocols are distributed versions of 
known quantum query algorithms, like the Deutsch-Jozsa and Grover algorithms. Let us start 
with the first one. It is actually explained most easily in a direct way, without reference to the 
Deutsch-Jozsa algorithm (though that is where the idea came from). The problem is a promise 
version of the equality problem. Suppose the n-bit inputs x and y are restricted to the following 
case: 


Distributed Deutsch-Jozsa: either x = y, or x and y differ in exactly n/2 positions 


Note that this promise only makes sense if n is an even number, otherwise n/2 would not be integer. 
In fact it will be convenient to assume n is a power of 2. Here is a simple quantum protocol to 
solve this promise version of equality using only logn qubits of communication: 


1. Alice sends Bob the log n-qubit state a yo (—1)**|t), which she can prepare unitarily from 
x and logn |0)-qubits. 


2. Bob applies the unitary map |i) ++ (—1)"|z) to the state, applies a Hadamard transform to 
each qubit (for this it is convenient to view 7 as a log n-bit string), and measures the resulting 
log n-qubit state. 


3. Bob outputs 1 if the measurement gave |0!°8") and outputs 0 otherwise. 


It is clear that this protocol only communicates logn qubits, but why does it work? Note that the 
state that Bob measures is 


ogn 1 “ Lityi |; 1 < tityi tjjs 
_ (s2 mi) = ee eh 
{=l j=l jE{0,1 p08” 


This superposition looks rather unwieldy, but consider the amplitude of the |0'°2”) basis state. It 


is 2 (1), which is 1 if z = y and 0 otherwise because the promise now guarantees that 
x and y differ in exactly n/2 of the bits! Hence Bob will always give the correct answer. 
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What about efficient classical protocols (without entanglement) for this problem? Proving 
lower bounds on communication complexity often requires a very technical combinatorial analysis. 
Buhrman, Cleve, and Wigderson used a deep combinatorial result from [114] to prove that every 
classical errorless protocol for this problem needs to send at least 0.007n bits. 

This log n-qubits-vs-0.007n-bits example was the first exponentially large separation of quantum 
and classical communication complexity. Notice, however, that the difference disappears if we move 
to the bounded-error setting, allowing the protocol to have some small error probability. We can 
use the randomized protocol for equality discussed above or even simpler: Alice can just send a few 
(i, xi) pairs to Bob, who then compares the x;’s with his y;’s. If z = y he will not see a difference, 
but if x and y differ in n/2 positions, then Bob will probably detect this. Hence O(log 7) classical 
bits of communication suffice in the bounded-error setting, in sharp contrast to the errorless setting. 


16.4 Example 2: The Intersection problem 


Now consider the Intersection function, which is 1 if x; = y; = 1 for at least one i. Buhrman, Cleve, 
and Wigderson [74] also presented an efficient quantum protocol for this, based on Grover’s search 
algorithm (Chapter 7). We can solve Intersection if we can solve the following search problem: find 
some į such that x; = y; = 1, if such an 7 exists.2 We want to find a solution to the search problem 
on the string z = x A y (which is the bit-wise AND of x and y), since z; = 1 whenever both x; = 1 
and y; = 1. The idea is now to let Alice run Grover’s algorithm to search for such a solution. 
Clearly, she can prepare the uniform starting state herself. She can also apply the unitaries H and 
R herself. The only thing where she needs Bob’s help, is in implementing the phase-query Oz + 
(which she needs to do O(,/n) times, because that’s how many queries Grover makes). Alice and 
Bob can together implement a phase-query as follows. Whenever Alice wants to apply Oz, to a 


state o 
l$) = X aili), 
i=1 


she tags on her z;’s in an extra qubit (which she can do by the unitary map |2)|0) +> |2)|a;)) and 
sends Bob the state 7 

X ailé)|a). 

i=1 


Bob applies the unitary map 
lile) > (-1)** i)li) 


and sends back the result. Alice sets the last qubit back to |0} (which she can do unitarily because 
she has x), and now she has the state O,+|¢)! Thus we can simulate O, + using 2 messages of 
log(n) + 1 qubits each. Thus Alice and Bob can run Grover’s algorithm to find an intersection, 
using O(.\/n) messages of O(logn) qubits each, for total communication of O(,/nlogn) qubits. 
Later Aaronson and Ambainis [3] gave a more complicated protocol that uses O(,/n) qubits of 
communication. 

What about lower bounds? It is a well-known result of classical communication complexity that 
classical bounded-error protocols for the Intersection problem need about n bits of communication. 


?This is sometimes called the appointment-scheduling problem: view x and y as Alice’s and Bob’s agendas, respec- 
tively, with a 1 at the i-th bit indicating that timeslot 7 is available. Then the goal is to find a timeslot where Alice 
and Bob are both available, so they can schedule an appointment. 
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Thus we have a quadratic quantum-classical separation for this problem. Could there be a quantum 
protocol that uses much less than yn qubits of communication? This question was open for quite a 
few years after [74] appeared, until finally Razborov [202] showed that any bounded-error quantum 
protocol for Intersection needs to communicate about „y/n qubits. 


16.5 Example 3: The vector-in-subspace problem 


Notice the contrast between the examples of the last two sections. For the Distributed Deutsch- 
Jozsa problem we get an exponential quantum-classical separation, but the separation only holds 
if we require the classical protocol to be errorless. On the other hand, the gap for the disjointness 
function is only quadratic, but it holds even if we allow classical protocols to have some error 
probability. 

Here is a function where the quantum-classical separation has both features: the quantum 
protocol is exponentially better than the classical protocol, even if the latter is allowed some error: 


Alice receives a unit vector v € R™ 

Bob receives two m-dimensional projectors Py and P; such that Py + Pi =I 
Promise: either Pov = v or Pav =v. 

Question: which of the two? 


As stated, this is a problem with continuous input, but it can be discretized in a natural way by 
approximating each real number by O(log m) bits. Alice and Bob’s input is now n = O(m? log m) 
bits long. There is a simple yet efficient 1-round quantum protocol for this problem: Alice views v 
as a log m-qubit state and sends this to Bob; Bob measures with operators Po and P}, and outputs 
the measurement result (0 or 1). this takes only logm = O(logn) qubits of communication, and 
Bob’s output is correct with probability 1 thanks to the promise on the inputs. 

The efficiency of this protocol comes from the fact that an m-dimensional unit vector can be 
“compressed” or “represented” as a log m-qubit state. Similar compression is not possible with 
classical bits, which suggests that any classical protocol will have to send the vector v more or less 
literally and hence will require a lot of communication. This turns out to be true, but the proof is 
quite hard [157]. It shows that any bounded-error protocol needs to send 2(m!/?) bits. 


16.6 Example 4: Quantum fingerprinting 


The examples of the previous section were either exponential quantum improvements for promise 
problems (Deutsch-Jozsa and vector-in-subspace) or polynomial improvements for total problems 
(disjointness). We will now give an exponential improvement for the total problem of equality- 
testing, but in a restricted setting called the simultaneous message passing (SMP) model. Alice 
and Bob receive n-bit input x and y, respectively. They do not have any shared resources like shared 
randomness or an entangled state, but they do have local randomness. They don’t communicate 
with each other directly, but instead send a single message to a third party, called the Referee. The 
Referee, upon receiving message m, from Alice and mg from Bob, should output the value f(x,y). 
The goal is to compute f(x,y) with a minimal amount of communication from Alice and Bob to 
the Referee. 

We will see that for the equality problem there is an exponential savings in communication 
when qubits are used instead of classical bits. Classically, the problem of the bounded-error com- 


137 


munication complexity of equality in the SMP model was first raised by Yao [250], and was open 
for almost twenty years until Newman and Szegedy [195] exhibited a lower bound of ((./n) bits. 
This is tight, since Ambainis [11] constructed a bounded-error protocol for this problem where the 
messages are O(,/n) bits long (see Exercise 8). In contrast, in the quantum setting this problem 
can be solved with very little communication: only O(log n) qubits suffice [73]. 

The quantum trick is to associate each x € {0,1}” with a short quantum state |z), called 
the quantum fingerprint of x. Just like with physical fingerprints, the idea is that a quantum 
fingerprint is a small object that doesn’t contain very much information about the object x, but 
that suffices for testing if the fingerprinted object equals some other fingerprinted object. As we 
will see below, we can do such testing if the fingerprints are pairwise almost orthogonal. More 
precisely, an (n,m,¢€)-quantum fingerprinting scheme maps n-bit string x to m-qubit state |x) 
with the property that for all distinct x,y € {0,1}”", we have |(@z|@y)| < €. 

We will now show how to obtain a specific (n,m, 0.02)-quantum fingerprinting scheme from 
an error-correcting code C : {0,1}” > {0,1}% where m = logN ~ logn. There exist codes 
where N = O(n) and any two codewords C(x) and C(y) have Hamming distance close to N/2, say 
d(C(x),C(y)) € [0.49N,0.51N] (we won’t prove this here, but for instance a random linear code 
will work). Define the quantum fingerprint of x as follows: 


This is a unit vector in an N-dimensional space, so it corresponds to only [log N] = logn + O(1) 
qubits. For distinct x and y, the corresponding fingerprints will have small inner product: 


(drlby) = ey ys SEA A € [—0.02, 0.02]. 


E may 


y 
? 


Figure 16.2: Quantum fingerprinting protocol for the equality problem 


The quantum protocol is very simple (see Figure 16.2): Alice and Bob send quantum fingerprints 
of x and y to the Referee, respectively. The referee now has to determine whether x = y (which 
corresponds to (¢2|¢,) = 1) or x Æ y (which corresponds to (¢z|¢y) € [—0.02, 0.02]). The following 
test (Figure 16.3), sometimes called the SWAP-test, accomplishes this with small error probability. 

This circuit first applies a Hadamard transform to a qubit that is initially |0), then SWAPs 
the other two registers conditioned on the value of the first qubit being |1), then applies another 
Hadamard transform to the first qubit and measures it. Here SWAP is the operation that swaps the 
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measure 


Figure 16.3: Quantum circuit to test if |ø) = |dy) or |(¢z|dy)| is small 


two registers: |¢z)|¢y) ++ |¢y)|¢z). The Referee receives |z) from Alice and |¢,) from Bob and ap- 
plies the test to these two states. An easy calculation reveals that the outcome of the measurement 
is 1 with probability (1—|(¢x|@y)|?)/2. Hence if |¢,) = |y) then we observe a 1 with probability 0, 
but if |(dz|dy)| is close to 0 then we observe a 1 with probability close to 1/2. Repeating this 
procedure with several individual fingerprints can make the error probability arbitrarily close to 0. 


Exercises 


1. (H) Prove that classical deterministic protocols with one message (from Alice to Bob), need 
to send n bits to solve the equality problem. 


2. (a) (H) Show that if |¢) and |W) are non-orthogonal states (i.e., (¢|~) # 0), then there is 
no two-outcome projective measurement that perfectly distinguishes these two states, in 
the sense that applying the measurement on |¢) always gives a different outcome from 
applying the same measurement to |). 


z 


Prove that quantum protocols with one message (from Alice to Bob), need to send at 
least n qubits to solve the equality problem (on n-bit inputs) with success probability 1 
on every input. Assume for simplicity that Bob does a projective measurement rather 
than a general POVM. 


PA, 
Q 
<x 


(H) Prove that quantum protocols with one message (from Alice to Bob), need to send at 
least logn qubits to solve the distributed Deutsch-Jozsa problem with success probabil- 
ity 1 on every input. Again assume for simplicity that Bob does a projective measurement 
rather than a general POVM. 


3. (H) Consider one-round quantum communication complexity. Alice gets input x € {0,1}", 
Bob gets input y € {0,1}”", and they want to compute some Boolean function f(x,y) of their 
inputs. Assume that all rows of the communication matrix are different, i.e., for all x and 2’ 
there is a y such that f(x,y) 4 f(z’, y). They are allowed only one round of communication: 
Alice sends a quantum message to Bob and Bob must then be able to give the right answer 
with probability 1. Prove that Alice needs to send n qubits to Bob for this. You may assume 
that Alice’s messages are pure states (this is without loss of generality). 


4. Suppose Alice and Bob each have n-bit agendas, and they know that for exactly 25% of 
the timeslots they are both free. Give a quantum protocol that finds such a timeslot with 
probability 1, using only O(log n) qubits of communication. 
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. (H) The disjointness problem of communication complexity is the following decision version 
of the intersection problem: Alice receives an x € {0,1}”", Bob receives y € {0,1}”", and 
f(x,y) = 0 if there is an 7 such that x; = y; = 1, and f(x,y) = 1 otherwise (i.e., f says 
whether x and y represent disjoint subsets of [n]). Suppose there exists an m-qubit one-way 
protocol that solves this problem, so where Alice sends Bob m qubits and then Bob outputs 
f(x,y) with probability at least 2/3. Prove the lower bound m = Q(n) on the number of 
qubits sent. 


. (H) Consider the intersection problem: Alice has input x € {0,1}", Bob has input y € {0,1}”, 
and they want to find (with success probability > 2/3) an i such that x; = yi = 1, if such 
an i exists. We know that using r = O(./n) messages between Alice and Bob, they can 
solve the intersection problem with O(./nlogn) qubits of communication (see Section 16.4). 
We also know that with only r = 1 message (i.e., one-way communication) O(n) qubits of 
communication are necessary and sufficient (see Exercise 5). Now suppose we limit them to 
some r € {1,...,./n} messages. This r is known to Alice and Bob. Describe a communication 
protocol by means of which Alice and Bob can solve the intersection problem with at most r 
messages, and O((n/r) logn) qubits of communication in total. 


(a) Consider the following variant of the search problem: we are given query access to a 
string x € {0,1}", and we know a set S C [n] of k < n elements such that x; = 0 for all 
i ¢ S. Show that there is a quantum algorithm that can find a solution for this search 
problem (i.e., an i such that x; = 1, if there is one) with success probability > 2/3, using 
O(Vk) queries to x. 


Consider the following variant of the intersection problem of communication complexity: 
Alice holds a string x € {0,1}”" of Hamming weight k, and Bob holds a string y € {0, 1}" 
of Hamming weight k. Give a quantum communication protocol that finds an 7 such 
that x; = yi = 1 (if such an i exists) with success probability > 2/3, using O(Wk log n) 
qubits of communication. 


€ 


. Consider an error-correcting code C : {0,1}” > {0,1}. where N = O(n), N is a square, and 
any two distinct codewords are at Hamming distance d(C (x), C(y)) € [0.49N,0.51N] (such 
codes exist, but you don’t have to prove that). 


(a) View the codeword C(x) as a VN x vN matrix. Show that if you choose a row index uni- 
formly at random and choose a column index uniformly at random, then the unique in- 
dex i where these row and column intersect, is uniformly distributed over i € {1,..., N}. 


(b) (H) Give a classical bounded-error SMP-protocol for the equality problem where Alice 
and Bob each send O(,/n) bits to the Referee. 


. Alice and Bob want to solve the equality problem on n-bit inputs x and y (i.e., decide 
whether x = y). They do not share randomness or entanglement but can use local (private) 
randomness. 


(a) (H) Fix a prime number p € [3n, 6n], then the set Fp of integers modulo p is a finite field 
(i.e., it has a well-defined addition and multiplication). For £ = (xo,...,%n—-1) € {0,1}”, 
define the univariate polynomial P, : Fp + F, of degree < n as P,(t) = a xit? (note 
that the n bits of x are used as coefficients here, not as the argument of the polynomial). 
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10. 


11. 


12. 


13. 


Show that for distinct n-bit strings x and y, we have Pricp,|[Pr(t) = Py(t)] < 1/3, where 
the probability is taken over a uniformly random t € Fp. 

(b) Use (a) to give a classical communication protocol where Alice sends an O(log n)-bit 
message to Bob, and Bob can decide whether x = y with success probability > 2/3. 

(c) Use (a) to give a quantum fingerprinting scheme x +> |z), where quantum state |¢,) 
has O(log n) qubits, and |(¢z|¢y)| € [0, 1/3] for all distinct n-bit strings x and y (prove 
the latter property explicitly, it’s not enough to write down only the states). 


The inner product problem in communication complexity is the function f : {0,1}" x 
{0,1}" — {0,1} defined by f(x,y) = Sy, ziyi mod 2. Suppose there exists a quantum 
protocol P for Alice and Bob that uses q qubits of communication (possibly using multiple 
messages between Alice and Bob) and computes the inner product function with success prob- 
ability 1 (on every possible inputs x,y). The protocol does not assume any shared entangled 
state at the start. 


(a) Give a quantum protocol that uses 2q qubits of communication and implements the 2n- 
qubit map |z) 4ly)B > (—1)*"|z) 4ly)B (possibly with some auxiliary qubits for each of 
Alice and Bob; these should start and end in state |0)). 

(b) (H) Give a protocol where Alice transmits x to Bob using 2q qubits of communication. 


(c) Derive a lower bound on q from (b) and Holevo’s theorem (Theorem 3 of Chapter 15; 
be specific about which part of the theorem you invoke). 


Consider the following problem in communication complexity. Alice’s input has two parts: 
a unit vector v € R™ and two orthogonal projectors Po and Pı. Bob’s input is an m x m 
unitary U. They are promised that the vector Uv either lies in the subspace corresponding to 
Pp (i.e., PoUv = v) or in the subspace corresponding to Pı (i.e., P,\Uv = v), and the problem 
for Alice and Bob is to find out which of these two cases holds. 


(a) Give a quantum protocol that uses two messages of O(log m) qubits (one message from 
Alice to Bob and one from Bob to Alice) to solve this problem with success probability 1. 

(b) (H) Show that there exists a constant c > 0 such that classical protocols need to send 
Q(m°) bits of communication to solve this problem with error probability < 1/3, even 
when they are allowed to send many messages. 


(H) Consider the following communication complexity problem, called the “Hidden Matching 
Problem.” Alice’s input is some x € {0,1}”". Bob’s input is a matching M, i.e., a partition of 
{1,...,} into n/2 disjoint unordered pairs (assume n is a power of 2 for simplicity). Their 
goal is that Bob outputs a pair {i,j} € M together with the parity x; ® x; of the two bits 
indexed by that pair. It doesn’t matter which pair {i,j} € M Bob outputs, as long as the 
additional bit of output equals the parity of the two indexed bits of x. Show that they can 
solve this problem with success probability 1 using only a message of logn qubits from Alice 
to Bob (and no communication from Bob to Alice). 

Comment: One can show that classical one-way protocols need 2(,/n) bits of communication to solve this 


problem with small error probability. 


(a) Suppose you have a state J (10)14) + |1)|w)), where |¢) and |) are quantum states with 
real amplitudes. Suppose you apply a Hadamard gate to its first qubit and then measure 
that first qubit. Show that the probability of measurement outcome 0 is (1+ (d|w)). 
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(b) 


“— 
Q 
nN” 


Suppose H is a subgroup of a finite group G, and g € G some element. Show (1) if 
g € H then the cosets go H and H are equal 
and (2) if g Z H then the cosets go H and H are disjoint. 


Suppose you are given quantum state |Y) = z new |h) (for an unknown H < G), 
and an element g € G. You may assume you have a unitary A available that implements 
the group operation, A: |g, h) > |g,g © h), and you may also apply a controlled version 
of A. Give an algorithm that acts on |p) and possibly some auxiliary qubits, and that 


outputs 0 with probability 1 if g € H, and outputs 0 with probability < 1/2 if g g H. 


(H) Consider the following communication complexity problem. Alice and Bob both 
know a finite group G, Alice gets as input some subgroup H < G (for instance in the 
form of a generating set for H) and Bob gets input g € G. Give a one-way quantum 
protocol where Alice sends to Bob a message of O(log |G|) qubits, and then Bob decides 
with success probability > 2/3 whether g € H. 
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Chapter 17 


Entanglement and Non-Locality 


17.1 Quantum non-locality 


Entangled states are those that cannot be written as a tensor product of separate states. The most 
famous one is the EPR-pair: i 

V2 
Suppose Alice has the first qubit of the pair, and Bob has the second. If Alice measures her qubit 
in the computational basis and gets outcome b € {0,1}, then the state collapses to |bb). Similarly, 
if Alice measures her qubit in some other basis, this will collapse the joint state (including Bob’s 
qubit) to some state that depends on her measurement basis as well as its outcome. Somehow 
Alice’s action seems to have an instantaneous effect on Bob’s side—even if the two qubits are 
light-years apart! This was a great bother to Einstein, whose theory of relativity posits that 
information and causation cannot travel faster than the speed of light. Einstein called such effects 
of entanglement “spooky action at a distance” (in German: “spukhafte Fernwirkungen”), and 
viewed it as a fundamental problem for quantum mechanics [106]. In his view, quantum mechanics 
should be replaced by some “local realist” physical theory that would still have the same predictive 
power as quantum mechanics. Here “local” means that information and causation act locally, not 
faster than light, and “realistic” means that physical systems have definite, well-defined properties 
(even if those properties may be unknown to us). 

Note that the above experiment where Alice measures her half of the EPR-pair doesn’t actually 
violate locality: no information is transfered from Alice and Bob. From Bob’s perspective there 
is no difference between the situation where Alice measured and the situation where she didn’t.! 
For this experiment, a shared coin flip between Alice and Bob is a local realist physical model 
that has exactly the same observable consequences as measuring the qubits of the EPR-pair in the 
computational basis: a 50-50 distribution on outcomes |00) and |11). This shared-coin-flip model 
is local because no information is transfered between Alice and Bob, and it’s realist because the 
coin flip has a definite outcome (even if that outcome is unknown to Alice and Bob before they 
measure). 

Given this example, one might hope (and Einstein expected) that any kind of behavior that 
comes from entangled states can be replaced by some local realist physical model. This way, 
quantum mechanics could be replaced by an alternative physical theory with less counter-intuitive 


(100) + |11)). 


"In fact, one can show that entanglement cannot replace communication, see for example Exercise 18.7. 
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behavior. Surprisingly, in the 1960s, John Bell [39] devised entanglement-based experiments whose 
behavior cannot be reproduced by any local realist theory. In other words, we can let Alice and Bob 
do certain measurements on an entangled state, and the resulting distributions on their outputs pre- 
dicted by quantum mechanics, cannot be obtained from any local realist theory. This phenomenon 
is known as “quantum non-locality.” It could of course be that the quantum mechanical predictions 
of the resulting correlations are just wrong. However, in the early 1980s, such experiments were 
actually done by Aspect and others [31], and they gave the outcomes that quantum mechanics 
predicted.? Note that such experiments don’t prove quantum mechanics, but they disprove any 
local realist physical theory.? 


Such experiments, which realize correlations that are provably impossible to realize with local 
realist models, are among the deepest and most philosophical results of 20th century physics: the 
commonsense idea of local realism is most probably false! Since Bell’s seminal work, the concept of 
quantum non-locality has been extensively studied, by physicists, philosophers, and more recently 
by computer scientists. 


In the next sections we review some interesting examples. The two-party setting of these 
examples is illustrated in Fig. 17.1: Alice receives input x and Bob receives input y, and they 
produce outputs a and b, respectively, that have to be correlated in a certain way (which depends 
on the game). They are not allowed to communicate. In physics language, we could assume they 
are “space-like separated,” which means that they are so far apart that they cannot influence each 
other during the course of the experiment (assuming information doesn’t travel faster than the 
speed of light). In the classical scenario they are allowed to share a random variable. Physicists 
would call this the “local hidden variable” that gives properties their definite value (that value may 
be unknown to the experimenter). This setting captures all local realist models. In the quantum 
model Alice and Bob are allowed to share entangled states, such as EPR-pairs. The goal is to show 
that entanglement-based strategies can do things that local realist strategies cannot. 


Inputs: £ yY 
Bob 
Outputs: a b 


Figure 17.1: The non-locality scenario involving two parties: Alice and Bob receive inputs x and y, 
respectively, and are required to produce outputs a and b that satisfy certain conditions. Once the 
inputs are received, no communication is permitted between the parties. 


?Modulo some technical “loopholes” due to imperfect photon sources, measurement devices, Alice and Bob not 
being sufficiently far apart etc. These are still hotly debated, but most people accept that Aspect’s and later 
experiments are convincing, and kill any hope of a complete local-realist explanation of nature. Recently [137] an 
experiment was done that simultaneously closed the two most important loopholes. 

3Despite its name, non-locality doesn’t disprove locality, but rather disproves the conjunction of locality and 
realism—at least one of the two assumptions has to fail. 
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17.2 CHSH: Clauser-Horne-Shimony-Holt 


In the CHSH game [87] Alice and Bob receive input bits x and y, and their goal is to output bits 
a and b, respectively, such that 

a@b=a2Ay, (17.1) 
(‘A’ is logical AND; ‘@’ is parity, i.e. addition mod 2) or, failing that, to satisfy this condition with 
as high a probability as possible. 

First consider the case of classical deterministic strategies, so without any randomness. For 
these, Alice’s output bit depends solely on her input bit x, and similarly for Bob. Let ag be the 
bit that Alice outputs if her input is z = 0, and a, the bit she outputs if x = 1. Let bo, b; be the 
outputs Bob gives on inputs y = 0 and y = 1, respectively. These four bits completely characterize 
any deterministic strategy. Condition (17.1) becomes 


ag ® bo = 
ao ® by 
ay ® bo 
ai bı = 


nd 


0 
0 
0 
1. (17.2) 


It is impossible to satisfy all four equations simultaneously, since summing them modulo 2 yields 
0 = 1. Therefore it is impossible to satisfy Condition (17.1) perfectly. Since a probabilistic strategy 
(where Alice and Bob share randomness) is a probability distribution over deterministic strategies, 
it follows that no probabilistic strategy can have success probability better than 3/4 on every 
possible input (the 3/4 can be achieved simultaneously for every input, see Exercise 4).4 

Now consider the same problem but where Alice and Bob are supplied with a shared 2-qubit 
system initialized to the entangled state 


4 (100) — |11)). 


Such a state can easily be obtained from an EPR-pair by local operations, for instance if Alice 
applies a Z-gate to her qubit. Now the parties can produce outputs that satisfy Condition (17.1) 
with probability cos(7/8)? ~ 0.85 (higher than what is possible in the classical case), as follows. 
. ; . 0 —sind 
Recall the unitary operation that rotates the qubit by angle 6: R(@) = ( pa 9 ne 9 ) .Ifx=0 
then Alice applies R(—7/16) to her qubit; and if x = 1 she applies R(37/16). Then Alice measures 
her qubit in the computational basis and outputs the resulting bit a. Bob’s procedure is the same, 
depending on his input bit y. It is straightforward to calculate that if Alice rotates by 04 and Bob 
rotates by 0p, the state becomes 


1 ; 
Va (cos(@4 + @g)(|00) — |11)) + sin(@4 + 98)(|01) + |10))). 
After the measurements, the probability that a ® b = 0 is cos(04 + 0g)?. Note that if z Ay = 0 
then 64 + 0g = +7/8, while if x Ay = 1 then 04 + 0g = 32/8. Hence Condition 17.1 is satisfied 
with probability cos(7/8)? for all four input possibilities, showing that quantum entanglement 


“Such statements, upper bounding the optimal success probability of classical strategies for a specific game, are 
known as Bell inequalities. This specific one is called the CHSH inequality. 
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allows Alice and Bob to win the game with a probability that’s higher than what the best classical 
strategy can achieve. Tsirelson [86] showed that cos(7/8)? is the best that quantum strategies can 
do for CHSH, even if they are allowed to use much more entanglement than one EPR-pair (see 
Exercise 6). 


17.3 Magic square game 


Is there a game where the quantum protocol always succeeds, while the best classical success 
probability is bounded below 1? A particularly elegant example is the following magic square 
game [25]. Consider the problem of labeling the entries of a 3 x 3 matrix with bits so that the 
parity of each row is even, whereas the parity of each column is odd. This is clearly impossible: 
if the parity of each row is even then the sum of the 9 bits is 0 mod 2, but if the parity of each 
column is odd then the sum of the 9 bits is 1 mod 2. The two matrices 


0/01] 0 0/00 
01010 0/0 
1110 1/1J1 


each satisfy five out of the six constraints. For the first matrix, all rows have even parity, but only 
the first two columns have odd parity. For the second matrix, the first two rows have even parity, 
and all columns have odd parity. 

Consider the game where Alice receives x € {1,2,3} as input (specifying the number of a row), 
and Bob receives y € {1,2,3} as input (specifying the number of a column). Their goal is to each 
produce 3-bit outputs, a,a2a3 for Alice and b1b2b3 for Bob, such that 


1. They satisfy the row/column parity constraints: a, ® a2 ® a3 = 0 and bı @ b2 © b3 = 1. 
2. They are consistent where the row intersects the column: ay = by. 


As usual, Alice and Bob are forbidden from communicating once the game starts, so Alice does not 
know y and Bob does not know x. We shall show the best classical strategy has success probability 
8/9, while there is a quantum strategy that always succeeds. 

An example of a deterministic strategy that attains success probability 8/9 (when the input ry 
is uniformly distributed) is where Alice plays according to the rows of the first matrix above and 
Bob plays according the columns of the second matrix above. This succeeds in all cases, except 
where x = y = 3. To see why this is optimal, note that for any other classical strategy, it is possible 
to represent it as two matrices as above but with different entries. Alice plays according to the 
rows of the first matrix and Bob plays according to the columns of the second matrix. We can 
assume that the rows of Alice’s matrix all have even parity; if she outputs a row with odd parity 
then they immediately lose, regardless of Bob’s output. Similarly, we can assume that all columns 
of Bob’s matrix have odd parity. Considering such a pair of matrices, the players lose at each 
entry where they differ. There must be such an entry, since otherwise it would be possible to have 
all rows even and all columns odd with one matrix. Thus, when the input xy is chosen uniformly 
from {1,2,3} x {1,2,3}, the success probability of any classical strategy is at most 8/9. 


°In fact, the game can be simplified so that Alice and Bob each output just two bits, since the parity constraint 
determines the third bit. 
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We now give the quantum strategy for this game. Let I, X, Y, Z be the 2 x 2 Pauli matrices 
from Appendix A.9. Each is a 1-qubit observable with eigenvalues in {+1,—1}.° That is, each can 
be written as P} — P_ where P, and P_ are orthogonal projectors that sum to identity, and hence 
define a two-outcome measurement with outcomes +1 and —1. For example, Z = |0)(0| — |1)(1|, 
corresponding to a measurement in the computational basis (with |b) corresponding to outcome 
(—1)’). And X = |+)(+| —|—)(—|, corresponding to a measurement in the Hadamard basis. The 
Pauli matrices are self-inverse, they anti-commute unless one of them is I (e.g., XY = —Y X), and 
X =iZY,Y =ixZ, and Z = iY X. Consider the following table, where each entry is a tensor 
product of two Paulis: 


X@X|YSOZ|Z2eY 
Y@Y | Z@X | X8Z 
ZO@Z|X@Y|YOX 


Because (P} — P_) @ (Q+ — Q_) = (Pz 8 Q+ + P- 8 Q_) — (P 8 Q- + P- ® Q+), each such 
product is itself a {+1,—1}-valued observable. Hence each product of Pauli matrices corresponds 
to a measurement on a 2-qubit space, with outcomes +1 and —1. 

Note that the observables along each row commute and their product is J ® J, and the ob- 
servables along each column commute and their product is —/ & J. This implies that for any 
2-qubit state, performing the three measurements along any row results in three {+1, —1}-valued 
bits whose product is +1. Also, performing the three measurements along any column results in 
three {+1, —1}-valued bits whose product is —1. 

We can now describe the quantum protocol. It uses two pairs of entangled qubits, each of which 
is in initial state 


1 
V2 
(again, such states can be obtained from EPR-pairs by local operations). Alice, on input x, applies 
three 2-qubit measurements corresponding to the observables in row x of the above table. For each 
measurement, if the result is +1 then she outputs 0, and if the result is —1 then she outputs 1. 
Similarly, Bob, on input y, applies the measurements corresponding to the observables in column y, 
and converts the +1-outcomes into bits. 

We have already established that Alice and Bob’s output bits satisfy the required parity con- 
straints. It remains to show that Alice and Bob’s output bits agree at the point where the row 
meets the column. For that measurement, Alice and Bob are measuring with respect to the same 
observable in the above table. Because all the observables in each row and in each column com- 
mute, we may assume that the place where they intersect is the first observable applied. Those 
bits are obtained by Alice and Bob each measuring $(|01) — |10))(|01) — |10)) with respect to the 
observable in entry (x,y) of the table. To show that their measurements will agree for all cases of 
xy, we consider the individual Pauli measurements on the individual entangled pairs of the form 
5 (01) —|10)). Let a’ and b denote the 0/1-valued outcomes of the first measurement, and a” 
and b” denote the outcomes of the second. The measurement associated with the tensor product of 
two observables gives the same distribution over outcomes as measuring each individual observable 
and then taking the product of the two results. Hence we have a, = a’ Ẹ a” and bẹ = b' & b”. It is 


(101) — |10)) 


®See Section 1.2.2. In particular, a +1-valued observable A can be written as A = P — Q, where P and Q are 
projectors on two orthogonal subspaces such that P + Q = I. This corresponds to a two-outcome measurement 
specified by projectors P and Q with outcomes +1 and —1, respectively. 
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straightforward to verify that if the same measurement from {I, X,Y, Z} is applied to each qubit 
of 3 (|01) — |10)) then the outcomes will be distinct: a’ 6 b = 1 and a” @ b” = 1. We now have 
dy = by, because 


dy br = (ad Ga") G (b Gb") = (a GU) G (ab) =1461=0. (17.3) 


17.4 A non-local version of distributed Deutsch-Jozsa 


The previous two examples used small amounts of entanglement: one EPR-pair for CHSH, two 
EPR-pairs for magic square. In both cases we could show that classical protocols need at least 
some communication if they want to achieve the same as what entanglement-based protocols can 
achieve without any communication. We will now give a non-locality game that’s parametrized by a 
number n, and where Alice and Bob’s quantum strategy uses logn EPR-pairs [65]. The advantage 
is that we can show that classical protocols for this game need just some but actually much classical 
communication rather than at least some nonzero amount. 


Non-local DJ problem: Alice and Bob receive n-bit inputs x and y that satisfy the 
DJ promise: either x = y, or x and y differ in exactly n/2 positions. The task is for 
Alice and Bob to provide outputs a,b € {0,1}!°8” such that if x = y then a = b, and if 
x and y differ in exactly n/2 positions then a Æ b. 


They achieve this as follows 
1. Alice and Bob share logn EPR-pairs, i.e., the maximally entangled state Ti ys i) |i). 
2. They both apply locally a conditional phase to obtain: Ta Sg (1) li (—1) i). 


3. They both apply a Hadamard transform, obtaining 


n-1 
1 £; ; ua i 
—— J oat (-1)*°]a) (—1)""/0) 
n/n 
i=0 a€{0,1}los” be {0,1 fos” 


n-1 
= = ~ (Enercon) \a)|b). 


a,be{0,1}los™ \i=0 


4. They measure in the computational basis and output the results a and b, respectively. 


For every a, the probability that both Alice and Bob obtain the same result a is: 


@k 1 
TNote that k EPR-pairs (J5(10) 10) + |1)al1)2)) can also be written as —— 5 lijali)s if we reorder 
a {0,1}* 
“l i€{0,1}* 
the qubits, putting Alice’s k qubits on the left and Bob’s on the right. While these two ways of writing the state 
strictly speaking correspond to two different vectors of amplitudes, they still represent the same bipartite physical 
state, and we will typically view them as equal. 
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which is 1/n if z = y, and 0 otherwise. This solves the problem perfectly using prior entanglement. 

What about classical protocols? Suppose there is a classical protocol that uses C bits of com- 
munication, and that wins the non-local Deutsch-Jozsa problem with success probability 1. If 
Alice and Bob ran this protocol, and then Alice communicated her output a to Bob (using an 
additional logn bits), then they could solve the distributed Deutsch-Jozsa problem since Bob could 
then check whether a = b or a Æ b. But we know from Section 16.3 that solving the distributed 
Deutsch-Jozsa problem requires at least 0.007n bits of communication. Hence C + logn > 0.007n, 
so C > 0.007n — logn = Q(n). Thus we have a non-locality problem that can be solved perfectly if 
Alice and Bob share logn EPR-pairs, while classically it needs not just some communication, but 
actually a lot of communication if we want to solve it perfectly. 


Exercises 
1. Suppose Alice and Bob share an EPR-pair 35 (|00) + |11)). 


(a) Let U be a 1-qubit unitary. Show that the following two states are the same: (1) the 
state obtained if Alice applies U to her qubit of the EPR-pair; 
(2) the state obtained if Bob applies the transpose U to his qubit of the EPR-pair. 
(b) (H) What state do you get if each of Alice and Bob applies a Hadamard transform to 
their qubit of the EPR-pair? 


2. Alice and Bob share an EPR-pair, a (|00)+ |11)). Suppose they each measure their qubit with 


an X-observable (which corresponds to a particular projective measurement with possible 
outcomes +1, —1). 


(a) Show that Alice’s measurement outcome is uniformly distributed, so 50% probability of 
outcome +1 and 50% probability of outcome —1. 

(b) (H) Show that Alice’s and Bob’s measurement outcomes are always equal. 

(c) Suppose we view X ® X as one 2-qubit observable (with possible outcomes +1, —1) 


instead of two 1-qubit observables. What is the probability distribution on the two 
possible outcomes? 


3. Alice and Bob share n EPR-pairs. Call their shared 2n-qubit state |y) az. 


(a) Let U be an arbitrary n-qubit unitary and U be U after conjugating its entries (without 
transposing). Prove that (U ® U)|w) ap = |) ap. 


Suppose Alice receives some input x, and she does an n-qubit unitary U, on her part of 
the state and then measures in the computational basis, obtaining a classical outcome 
a € {0,1}". What is the probability distribution over Alice’s measurement outcomes, 
and why? 


(b 


Na 


— 
Q 
wa 


Suppose Bob receives the same input x as Alice already received. How can he learn 
Alice’s measurement outcome a from part (b) without communication? (you may assume 
Bob knows the map x +> U,) 


4. (H) Give a classical strategy using shared randomness for the CHSH game, such that Alice 
and Bob win the game with probability at least 3/4 for every possible input x,y (note the 
order of quantification: the same strategy has to work for every x,y). 
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5. 


“Mermin’s game” is the following. Consider three space-like separated players: Alice, Bob, 
and Charlie. Alice receives input bit x, Bob receives input bit y, and Charlie receives input 
bit z. The input satisfies the promise that x @ y ® z = 0. The goal of the players is to output 
bits a,b, c, respectively, such that a@b@c = OR(z, y, z). In other words, the outputs should 
sum to 0 (mod 2) if x = y = z = 0, and should sum to 1 (mod 2) ifx+y+z2=2. 


(a) Show that every classical deterministic strategy will fail on at least one of the 4 allowed 
inputs. 


(b) Show that every classical randomized strategy has success probability at most 3/4 under 
the uniform distribution on the four allowed inputs xyz. 


(c) Suppose the players share the following entangled 3-qubit state: 


; (000) — Jo11) — |101) — |110)). 


Suppose each player does the following: if his/her input bit is 1, apply H to his/her 
qubit, otherwise do nothing. Describe the resulting 3-qubit superposition. 


(d) Using (c), give a quantum strategy that wins the above game with probability 1 on every 
input that satisfies the promise. 


. (H) This question examines how well the best quantum protocol can do for CHSH (resulting 


in the so-called “Tsirelson bound”). Consider a protocol where Alice and Bob share a 2k- 
qubit state |Y) = |W) ag with k qubits for Alice and k for Bob (the state can be arbitrary and 
need not consist of EPR-pairs). Alice has two possible +1-valued observables Ap and A1, and 
Bob has two possible +1-valued observables Bọ and Bı. Each of these observables acts on k 
qubits. On inputs x € {0,1} and y € {0,1}, respectively, Alice measures her half of |Y} with 
A, and outputs the resulting sign a € {+1,—1}, and Bob measures his half of |W) with By 
and outputs the resulting sign b. Note that we treat the output bits as signs instead of 0/1 
now. However, the winning condition is the same: the AND of the input bits should equal 
the parity (XOR) of the output bits. So Alice and Bob win the game if (—1)*” = ab. 


(a) Show that the expected value of the product ab on inputs x,y is (W|Az ® Byl) (this is 
the same as Tr [(Az ® Byly) (W|]). 


(b) Define 2k-qubit operator C = Ao ® Bo + Ap ® Bı + Ai ® Bo — Ai ® Bı. Show that the 
winning probability of the protocol (averaged over all 4 inputs pairs x, y) is $+$(W|C|y). 

(c) Show that C? = 4I + (Aj A, — A; Ao) Q (B1 Bo — Bo B1), where I is the 2k-qubit identity 
matrix. 


(d) Show that (~|Cl@) < V8. 


(e) What can you conclude about the best-possible winning probability among all possible 
quantum protocols for CHSH? 
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Chapter 18 


Quantum Cryptography 


18.1 Saving cryptography from Shor 


Most classical public-key cryptography in use today can be broken by a large quantum computer. 
In particular, the RSA system relies on the hardness of factoring integers and hence is broken 
by Shor’s factoring algorithm (see Exercise 5.3); and Diffie-Helman relies on the hardness of the 
discrete logarithm problem which was also broken by Shor (see Exercise 6.3). This could clearly 
become a huge problem for society if and when a large quantum computer is realized: if we cannot 
securely send messages, make payments or sign transactions online anymore, then much of our 
economy and society breaks down, or will at least need to be heavily reconfigured. 

There are two ways to address this problem. On the one hand we can try to design other 
classical cryptographic systems, based on the assumed hardness (even for quantum computers) of 
computational problems other than factoring or discrete log. This part of classical cryptography is 
(slightly confusingly) called post-quantum cryptography [52]. Its most famous cryptosystem to date 
is “learning with errors” (LWE) [204], which relies on the assumed hardness of certain computational 
problems in integer lattices. 

On the other hand, we can also try to design cryptographic systems that explicitly rely on quan- 
tum effects. This area is called quantum cryptography and is the topic of this chapter. Compared 
to post-quantum cryptography, this has the disadvantage that even the honest users of the scheme 
need to have a (simple) quantum computer at their disposal, but it has the advantage that the 
security against adversaries in some cases is information-theoretic, not predicated on the assumed 
but unproven hardness of some computational problems. 


18.2 Quantum key distribution 


One of the most basic tasks of cryptography is to allow Alice to send a message to Bob (whom 
she trusts) over a public channel, without allowing a third party Eve (for “eavesdropper”) to 
get any information about M from tapping the channel. Suppose Alice wants to send message 
M € {0,1}” to Bob. The goal here is not minimal communication, but secrecy. This is often done 
by public-key cryptography such as RSA. Such schemes, however, are only computationally secure, 
not information-theoretically secure: all the information about the private key can be computed 
from the public key, it just appears to take a lot of time to compute it—assuming of course that 
problems like factoring are classically hard, and that nobody builds a quantum computer... 
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In contrast, the following “one-time pad” scheme is information-theoretically secure. If Alice 
and Bob share a secret key K € {0,1}”" then Alice can send C = M@K over the channel. By adding 
K to what he received, Bob learns M. On the other hand, if Eve didn’t know anything about K 
then she learns nothing about M from tapping the message M ® K that goes over the channel. 
How can we make Alice and Bob share a secret key? In the classical world this is impossible, but 
with quantum communication it can be done! 

Below we describe the famous BB84 quantum key distribution (QKD) protocol of Bennett and 
Brassard [51]. Consider two possible bases: basis 0 is the computational basis {|0),|1)}, and basis 1 
is the Hadamard basis {|+),|—)}. The main property of quantum mechanics that we’ll use, is 
that if a bit b is encoded in an unknown basis, then Eve cannot get information about b without 
disturbing the state, and the latter can be detected by Alice and Bob.! 


1. Alice chooses n random bits aj,...,@, and n random bases b1,...,bn. She sends a; to Bob 
in basis b; over the public quantum channel. For example, if a; = 0 and b; = 1 then the i-th 
qubit that she sends is in state |+). 


2. Bob chooses random bases b/,...,b/, and measures the qubits he received in those bases, 
/ 


yielding bits aj, ..., ah. 

3. Bob sends Alice all b; (this also signals to Alice that Bob has measured the qubits he received), 
and Alice sends Bob all b;. Note that for roughly n/2 of the is, Alice and Bob used the same 
basis b; = bi. For those i Bob should have a, = a; (if there was no noise and Eve didn’t 
tamper with the i-th qubit on the channel). Both Alice and Bob know for which is this holds. 
Let’s call these roughly n/2 positions the “shared string.” 


4. Alice randomly selects n/4 locations in the shared string, and sends Bob those locations as 
well as the values a; at those locations. Bob then checks whether they have the same bits 
in those positions. If the fraction of errors is bigger than some number p, then they suspect 
some eavesdropper was tampering with the channel, and they abort.” 


5. If the test is passed, then they discard the n/4 test-bits, and have roughly n/4 bits left in their 
shared string. This is called the “raw key.” Now they do some classical postprocessing on the 
raw key: “information reconciliation” to ensure they end up with exactly the same shared 
string, and “privacy amplification” to ensure that Eve has negligible information about that 
shared string.? 


The communication is n qubits in step 1, 2n bits in step 3, O(n) bits in step 4, and O(n) bits in 
step 5. So the required amount of communication is linear in the length of the shared secret key 
that Alice and Bob end up with. 


‘Quantum key distribution might in fact better be called “quantum eavesdropper detection.” There is another 
assumption underlying BB84 that should be made explicit: we assume that the classical channel used in steps 3-5 
is “authenticated,” meaning that Alice and Bob know they are talking to each other, and Eve can listen but not 
change the bits sent over the classical channel (in contrast to the qubits sent during step 1 of the protocol, which Eve 
is allowed to manipulate in any way she wants). One can authenticate a classical communication channel by using 
some shared secret key; if this is used, then one may think of QKD as something that allows to grow an initial shared 
secret key, rather than as something that conjures up a shared random key out of nothing. 

?The number p can for instance be set to the natural error-rate that the quantum channel would have if there 
were no eavesdropper. 

3This can be done for instance by something called the “leftover hash lemma.” 
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It’s quite hard to formally prove that this protocol yields (with high probability) a shared key 
about which Eve has negligible information. In fact it took more than 12 years before BB84 was 
finally proven secure [188, 175]. The main reason it works is that when the qubits that encode 
@1,---,@m are going over the public channel, Eve doesn’t know yet in which bases 01,..., bn these 
are encoded (she will learn the b; later from tapping the classical communication in step 3, but 
at that point this information is not of much use to her anymore). She could try to get as much 
information as she can about aj,...,@, by some measurement, but there’s an information-vs- 
disturbance tradeoff: the more information Eve learns about a1,...,@, by measuring the qubits, 
the more she will disturb the state, and the more likely it is that Alice and Bob will detect her 
presence in step 4. 


We won’t go into the full proof details here, just illustrate the information-disturbance tradeoff 
for the case where Eve individually attacks the qubits encoding each bit in step 1 of the protocol.* 
In Fig. 18.1 we give the four possible states for one BB84-qubit. If Alice wants to send a; = 0, 
then she sends a uniform mixture of |0) and |+) across the channel; if Alice wants to send a; = 1 
she sends a uniform mixture of |1) and |—). Suppose Eve tries to learn a; from the qubit on the 
channel. The best way for her to do this is to measure in the orthonormal basis corresponding to 
state cos(7/8)|0) + sin(/8)|1) and — sin(7/8)|0) + cos(7/8)|1). Note that the first state is halfway 
between the two encodings of 0, and the second state is halfway between the two encodings of 1 
(remember that |—) and —|—) are physically indistinguishable because they only differ by a global 
phase). This will give her the value of a; with probability cos(7/8)? ~ 0.85 (remember the 2-to-1 
quantum random access code from Exercise 2 of Chapter 15). However, this measurement will 
change the state of the qubit by an angle of at least 7/8, so if Bob now measures the qubit he 
receives in the same basis as Alice, then his probability of recovering the incorrect value of a; is 
at least sin(7/8)? ~ 0.15 (if Bob measured in a different basis than Alice, then the result will be 
discarded anyway). If this i is among the test-bits Alice and Bob use in step 4 of the protocol 
(which happens with probability 1/2), then they will detect an error. Eve can of course try a less 
disturbing measurement to reduce the probability of being detected, but such a measurement will 
also have lower probability of telling her a;. 


T. 


Figure 18.1: The four possible states in BB84 encoding: |0} and |+) are two different encodings 
of 0, and |1) and |—) are two different encodings of 1. 


“The more complicated situation where Eve does an n-qubit measurement on all qubits of step 1 simultaneously 
can be reduced to the case of individual-qubit measurements by something called the quantum De Finetti theorem, 
but we won’t go into the details here. 
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18.3 Reduced density matrices and the Schmidt decomposition 


Suppose Alice and Bob share some pure state |}. If this state is entangled, it cannot be written 
as a tensor product |¢,4) @ |¢g) of separate pure states for Alice and Bob. Still, there is a way to 
describe Alice’s local state as a mixed state, by tracing out Bob’s part. Formally, if C ® D is a 
tensor product matrix then Trg(C®@D) = C-Tr(D). By extending this linearly to matrices that are 
not of product form, the operation Trg is well-defined on all mixed states. Note that Trg removes 
Bob’s part of the state, leaving just Alice’s part of the state. If pap is some bipartite state (mixed 
or pure, entangled or not), then pa = Trp(paz) is Alice’s local density matrix. This describes all 
the information she has. For example, for an EPR-pair |¢) = J (|00) + |11)), the corresponding 
density matrix is 


PAB = 5 (100) (00) + [00) (11| + |11) (00| + |11) (11|) 
= 5 dool & |0){O| + [0 | 8 10 1] + 110] @ |1){O] + [D4] @ [1) (1), 
and since Tr(|a)(b|) = 1 if a = b and Tr(|a)(b|) = 0 if |a) and |b) are orthogonal, we have 


pa = Tep(oan) = 5 (10)(0] + 11X411). 


In other words, Alice’s local state is the same as a random coin flip! Similarly we can compute Bob’s 
local state by tracing out Alice’s part of the space: pg = Tra(papB). Note that the original 2-qubit 
density matrix pap is not equal to p4 & pp, because the tracing-out operation has “removed” the 
entanglement between the two qubits. 

The Schmidt decomposition is a very useful way to write bipartite pure states, and allows us to 
easily calculate the local density matrices of Alice and Bob. It says the following: for every bipartite 
pure state |) there is a unique integer d (called the Schmidt rank of |¢)), an orthonormal set of 
states |a),...,|a@q) for Alice’s space, an orthonormal set of states |b1),...,|bg) for Bob’s space, and 
positive reals A,,..., Aq whose squares sum to 1, such that 


d 
|) = X Aslaa)|bi). (18.1) 
i=l 


For example, an EPR-pair has Schmidt coefficients 41 = Az = 1/ v2 and hence has Schmidt rank 2. 
The Schmidt rank and the Schmidt coefficients of a state |¢) are unique, but there is some freedom 
in the choice of bases if the A; are not all distinct. For example 


1 1 
pg Sa Pari) 
are two distinct Schmidt decompositions of the EPR-pair. 

The existence of the Schmidt decomposition is shown as follows. Let p4 = Trp(|)(¢|) be Alice’s 
local density matrix. This is Hermitian, so it has a spectral decomposition p4 = > ;—1 Hilai) (ai| 
with orthonormal eigenvectors |a;) and positive real eigenvalues j1;. Note that d is the rank of pa, 
and `; ui = Tr(p4) = 1. Then there are ci; such that 


d 
lp) = So Veicijlai) |), 
ij=l 
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where the |j) are the computational basis states for Bob’s space. Define A; = yp and |b;) = 
>; cijlJ)- This gives the decomposition of |$) of Eq. (18.1). It only remains to show that {[b;)} is 
an orthonormal set, which we do as follows. The density matrix version of Eq. (18.1) is 


d 


2) (41 = $ AAzlai) (a;l @ [bi) (By. 


ij=1 


We know that if we trace out the B-part from |¢)(¢|, then we should get p4 = X; A?|a;)(a;|, but 
that can only happen if (b;|b;) = Tr(|b;)(0;|) = 1 for i = j and (b;|b;) = 0 for i # j. Hence the 
|b;) form an orthonormal set. Note that from Eq. (18.1) it easily follows that Bob’s local density 
matrix is pg = >>, A?|b;) (dil. 


18.4 The impossibility of perfect bit commitment 


Key distribution is just one of the many tasks cryptographers would like to solve. Another important 
primitive is bit commitment. In this scenario there is no eavesdropper, but Alice and Bob don’t 
trust each other. Suppose Alice has a bit b which for the time being she doesn’t want to reveal 
to Bob, though she would like to somehow convince Bob that she has already made up her mind 
about b and won’t change its value later. A protocol for bit commitment comes in two stages, each 
of which may involve several rounds of communication: 


1. In the “commit” phase Alice gives Bob a state which is supposed to commit her to the value 
of b (without informing Bob about the value of b). 


2. In the “reveal” phase Alice sends b to Bob, and possibly some other information to allow him 
to check that this is indeed the same value b that Alice committed to before. 


A protocol is binding if Alice can’t change her mind, meaning she can’t get Bob to “open” 1 — b. 
A protocol is concealing if Bob cannot get any information about b before the “reveal phase.”° 

A good protocol for bit commitment would be a very useful building block for many other 
cryptographic applications. For instance, it would allow Alice and Bob (who still don’t trust each 
other) to jointly flip a fair coin. Maybe they’re going through a divorce, and need to decide who 
gets to keep their joint car. Alice can’t just flip the coin by herself because Bob doesn’t trust her 
to do this honestly, and vice versa. Instead, Alice would pick a random coin 6 and commit to it. 
Bob would then pick a random coin c and send it to Alice. Alice then reveals b, and the outcome of 
the coin flip is defined to be b@c. As long as at least one of the two parties follows this protocol, 
the result will be a fair coin flip. 

Perfect coin flipping (and hence also perfect bit commitment) are known to be impossible in 
the classical world. After BB84 there was some hope that perfect bit commitment (and hence also 
perfect coin flipping) would be possible in the quantum world, and there were some seemingly- 
secure proposals for quantum protocols to achieve this. Unfortunately it turns out that there is no 
quantum protocol for bit commitment that is both perfectly binding and perfectly concealing. 

To show that a protocol for perfect bit commitment is impossible, consider the joint pure 
state |Ø) that Alice and Bob would have if Alice wants to commit to bit-value b, and they both 


5A good metaphor to think about this: in the commit phase Alice locks b inside a safe which she sends to Bob. 
This commits her to the value of b, since the safe is no longer in her hands. During the reveal phase she sends Bob 
the key to the safe, who can then open it and learn b. 
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honestly followed the protocol. If the protocol is perfectly concealing, then the reduced density 
matrix on Bob’s side should be independent of b, i.e., Tr4(|¢0)(¢0|) = Tra(|¢1)(¢1|). The way we 
constructed the Schmidt decomposition in the previous section now implies that there exist Schmidt 
decompositions of |¢o) and |¢1) with the same A;’s and the same b;’s: there exist orthonormal bases 
{ai} and {a‘} such that 


d d 
lpo) = X Ailas)|bi) and |¢ġ1) = XC Ailas) lbi) 
i=] 


i=1 


Now Alice can locally switch from |¢9) to |¢1) by just applying on her part of the state the map 
|a;) + |a). Alice’s map is unitary because it takes one orthonormal basis to another orthonormal 
basis. But then the protocol is not binding at all: Alice can still freely change her mind about the 
value of b after the “commit” phase is over! Accordingly, if a quantum protocol for bit commitment 
is perfectly concealing, it cannot be binding at all. 


18.5 More quantum cryptography 


Quantum cryptography is by now a pretty large subset of the area of quantum information and 
computation. Here we just briefly mention a few other topics in quantum crypto (see [69]): 


e There are quantum protocols for bit commitment that are partially concealing and partially 
binding—something which is still impossible in the classical world. A primitive called “weak 
coin flipping” can be implemented almost perfectly in the quantum world, and cannot be 
implemented at all in the classical world. 


e Under assumptions on the fraction of dishonest players among a set of k parties, it is possible 
to implement secure multi-party quantum computation. This is a primitive that allows the 
players to compute any function of their k inputs, without revealing more information to 
player i than can be inferred from it’s input plus the function value. 


e One can actually do nearly perfect bit commitment, coin flipping, etc., assuming the dishonest 
party has bounded quantum storage, meaning that it can’t keep large quantum states coherent 
for longer times. At the present state of quantum technology this is a very reasonable as- 
sumption (though a breakthrough in physical realization of quantum computers would wipe 
out this approach). 


e In device-independent cryptography, Alice and Bob want to solve certain cryptographic tasks 
like key distribution or randomness generation without trusting their own devices (for instance 
because they don’t trust the vendor of their apparatuses). Roughly speaking, the idea here 
is to use Bell-inequality violations to prove the presence of entanglement, and then use this 
entanglement for cryptographic purposes. Even if Alice or Bob’s apparatuses have been 
tampered with, they can still only violate things like the CHSH inequality if they actually 
share an entangled state. 


e Experimentally it is much easier to realize quantum key distribution than general quantum 
computation, because you basically just need to prepare qubits (usually photons) in either the 


°The assumption that the state is pure rather than mixed is without loss of generality. 
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computational or the Hadamard basis, send them across a channel (usually an optical fibre, 
but sometimes free space), and measure them in either the computational or the Hadamard 
basis. Many sophisticated experiments have already been done. Somewhat surprisingly, you 
can already commercially buy quantum key distribution machinery. Unfortunately the im- 
plementations are typically not perfect (for instance, we don’t have perfect photon sources or 
perfect photon detectors), and once in a while another loophole is exposed in the implemen- 
tation, which the vendor then tries to patch, etc. 


Exercises 


1. Here we will consider in more detail the information-disturbance tradeoff for measuring a 
qubit in one of the four BB84 states (each of which occurs with probability 25%). 


(a) Suppose Eve measures the qubit in the orthonormal basis given by cos(@)|0) + sin(@)|1) 
and sin(@)|0) — cos(@)|1), for some parameter @ € [0,7/4]. The first basis vector corre- 
sponds to output 0, the second to output 1. For each of the four possible BB84 states, 
give the probabilities of outcome 0 and outcome 1 (so your answer should consist of 
8 numbers, each of which is a function of 6). 

(b) What is the average probability that Eve’s measurement outcome equals the encoded 
bit ai, as a function of 6? (average taken both over the uniform distribution over the 
four BB84 states, and over the probabilities calculated in part (a)) 


(c) By what angle does the state change in each of the 8 cases of (a)? 


2. (a) What is the Schmidt rank of the state 4(|00) + |01) + |10) + |11))? 


(b) Suppose Alice and Bob share k EPR-pairs. What is the Schmidt rank of their joint 
state? 


(c) Prove that a pure state |¢) is entangled if, and only if, its Schmidt rank is greater than 1. 

3. Give the Schmidt decomposition of the state $(|0)4|0)s + |0)4|1)e + |1)4|1)e + |1)4[2)z). 
Here Alice’s space has dimension 2, and Bob’s space has dimension 3. It suffices if you write 
down your Schmidt decomposition, being explicit about the values of the A;’s and what are 


the states |a;) and |b;). You can add your calculation (involving local density matrices etc.) 
as a justification, but you don’t have to. 


4. Consider a density matrix p on Alice’s Hilbert space. A bipartite pure state |w) 4p is called a 
purification of p, if p = Trg (|Y) (y|). The B-register in |q) 4p is called the purifying register. 
(a) Show that an EPR-pair is a purification of the 1-qubit mixed state p = I/2. 


(b) Show that if p is a density matrix of rank r, then there exists a purification of p where 
the purifying register has at most [logr] qubits. 


(c) Show that if |Y) ag and |v’) 4p are purifications of the same p, then there exists a unitary 
U on Bob’s space such that |7)") 48 = (I @ U)|wW) ap. 


5. Suppose Alice has a 1-qubit state p. 


(a) Suppose Alice chooses a uniformly random Pauli matrix (see Appendix A.9) and applies 
it to p. What is the resulting density matrix, averaged over the four cases? 
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(b) Suppose Alice and Bob shared a uniformly distributed secret 2-bit string ab, which is 
unknown to Eve. How can Alice send p to Bob over a public quantum channel, without 
leaking any information to Eve (i.e., the quantum state sent over the channel should by 
itself be independent of p), in such a way that Bob can recover p? 


6. (H) Suppose we have a qubit in mixed state p that we want to hide from Alice and Bob 
individually, but in such a way that if Alice and Bob cooperate, then they can recover p. 


Describe how we can change p into some other 1-qubit state p', what secret keys we give to 
Alice and Bob, why individually they can get no information about p from the qubit p’, and 
why jointly they can fully recover the qubit in state p from p’. The keys should be classical. 


7. (H) Prove that Alice cannot give information to Bob by doing a unitary operation on her 
part of an entangled pure state. 


8. Suppose Alice sends two n-bit messages Mı and Mə with the one-time pad scheme, reusing 
the same n-bit key K. Show that Eve can now get some information about Mı, Mə from 
tapping the classical channel. 


9. (a) (H) Consider a bipartite pure state shared between Alice and Bob, where Alice and 
Bob’s local spaces have dimension d each: 


S asl) ald). 


i jE{1,...,d} 


The state is given to you classically, as a list of d? amplitudes, each described by O(d) 
bits. Give a classical polynomial-time algorithm to find the Schmidt coefficients and to 
find Alice and Bob’s basis for a Schmidt decomposition. 


(b) Give a classical polynomial-time algorithm that decides whether a given bipartite pure 
state (given as in (a)) is entangled or not. 
Comment: If the given state were mixed instead of pure, this decision problem is known to be NP-hard 


and hence probably not polynomial-time solvable. 
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Chapter 19 


Quantum Machine Learning 


19.1 Introduction 


Machine learning tries to extract patterns and regularities from given data for the purposes of 
prediction and understanding. In a slogan, one could say: ML = data + optimization. The data is 
what you learn from; the optimization finds a good model or hypothesis for the given data, which 
hopefully has some generalization power. ML has gone through several ups and downs over the 
years, but currently is booming thanks to the success of so-called “deep learning,” based on neural 
networks.! ML is often subdivided into three subareas, depending on the data one has: 


1. In supervised learning we are given labelled data, for instance pictures of animals annotated 
with the kind of animal that’s on the picture, and we want to learn how to predict the label. 


2. In unsupervised learning we are just given unlabeled data, and need to find patterns in it. The 
canonical example is the clustering problem, where we are given unlabelled data items that 
we want to group into “similar” subsets. For example, it could be that our data consists of 
pictures of different kinds of animals (not labeled with the type of animal), and we somehow 
want to cluster the cat-pictures together, the wolf-pictures together, etc. We may or may not 
know in advance what the number of clusters should be. 


3. In reinforcement learning the learner actually interacts with the environment, receiving re- 
wards or penalties for desirable or undesirable behavior, and tries to learn from this interactive 
data to behave more successfully in the environment. This is roughly how a child learns.” 


It is a very interesting question to see how quantum computing changes and helps machine learn- 
ing. Here the learner would be a quantum computer, and the data may be classical or quantum. 
Quantum ML is by now a rather large area, and in this chapter we will go over a few representa- 
tive results and methods for supervised and unsupervised learning, mostly with classical output. 
See [101] for quantum applications to reinforcement learning, and [57, 27, 222] for much more. 


‘Machine learning based on neural networks has been studied for decades but quite suddenly became much more 
successful starting around 2012, due to the availability of more data, stronger computing hardware (incl. special- 
purpose GPUs for fast parallel matrix-vector calculations), and better software to do the training. 

?It’s also how a computer can learn to play games. One of the big breakthroughs of machine learning was in 2016 
when the AlphaGo program learned to play the game of Go so well that it beat one of the world’s best human Go 
players. Computers have been better than humans at chess already since the late 1990s (the IBM program “Deep 
Blue” beat Kasparov in 1997), but Go was viewed as a much more complicated game than chess. 
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19.2 Supervised learning from quantum data 


19.2.1 The PAC model of learning 


Let us first describe a mathematical model of what it means to learn from labeled data. This is 
Valiant’s PAC model [238], for “probably approximately correct” learning (see [223, 189] for more). 

Assume for simplicity that the labels are just binary: 0 or 1. Our goal is to learn a Boolean 
function f : ¥ — {0,1} from examples of the form (x, f(x)), where z € XY. A typical case would 
be ¥ = {0,1}”". The last bit f(x) of the example is called the label. Think for instance about the 
case where we are given 1000 x 1000-pixel black-and-white pictures (n = 1000, 000) whose labels 
f(x) indicate whether x is the picture of a wolf or not. We would like to learn f, or some good 
approximation of it, to be able to recognize pictures of wolves in the future. Some x’s are more 
important and more likely to appear as examples than others: many 1000 x 1000-grids don’t depict 
anything. The assumption in PAC learning is that the examples are generated (independent and 
identically distributed) according to some distribution D on Æ. The idea is that this D represents 
“the world” or “Nature,” which provides us with examples. We assume f cannot be completely 
arbitrary (in that case there would be an f consistent with every possible sequence of labeled 
examples) but comes from some known “concept class” C of Boolean functions. For instance, C 
could be a set of small logical formulas f on n Boolean variables, or a set of small-depth or small-size 
decision trees on n input bits, or neural networks with a restricted number of nodes or depth. 

A learning algorithm should generate a “hypothesis” h : XY — {0,1} that has small error 
compared to the unknown f that we’re trying to learn, measured under the same distribution D 
that generated the data.® The generalization error of h w.r.t. the target function f is defined as 


errp(f,h) = Prif(e) # h(a). 


This error measures how well we’ve generalized the examples, and how well we can predict the labels 
of future examples. We say that h is “approximately correct” if this error is small, at most some 
specified £. The goal in PAC learning is to output an h that is probably approximately correct: 


Definition 4 An (¢,6)-PAC learner for a concept class C w.r.t. distribution D on X, is an al- 
gorithm that receives m labeled examples (x1, f(£1)), ..-, (£m, f(&m)) for a target function f € C, 
where each x; ~ D, and that outputs a hypothesis h such that 


Prierrp(f,h) <e] > 1— ô. 


The learning algorithm has to satisfy the above for every possible target function f € C, and the 
probability is over both the choice of the eramples and over the internal randomness of the algorithm. 

An (€,6)-PAC learner for a concept class C is an algorithm that is an (€,6)-PAC learner for C 
w.r.t. every possible distribution D. 


Note that the first part of the definition is about learners that are only required to work correctly 
for one specific distribution D (for instance, the uniform distribution over 1), while the second 
part is “distribution-independent”: here we want a learner that works well irrespective of what 


3It is important to be taught and tested according to the same distribution D. Imagine a quantum-computing 
course whose lectures focused on the mathematics of quantum algorithms, but with an exam that focuses on physics 
questions about how to implement qubits and gates—that would clearly be very unreasonable. 
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(unknown) distribution D generates the data. This is in keeping with the usual attitude towards 
algorithms in computer science: these should work well even for a worst-case input. We don’t 
require the class H of possible hypotheses h to equal the class C of possible target functions f (if 
we add this requirement, then it’s called proper PAC learning). This allows us for instance to use 
neural networks to learn target functions that come from some other class C, say logical formulas. 

The number of examples m that a particular learning algorithm uses is called its “sample 
complexity,” and the overall time or number of elementary operations it takes to output h is its 
“time complexity.” Clearly the latter upper bounds the former, since we need at least one operation 
to process one example. The sample complexity of a concept class C (as a function of ¢,6) is the 
minimal sample complexity among all PAC learners for C. Ideally, a good learner for C has both 
small sample complexity and small time complexity (say, polynomial in n). For some concept classes 
C efficient distribution-independent PAC learners exist, for example the class of logical formulas 
in k-Conjunctive Normal Form (i.e., each f would be the AND of several ORs, each of at most k 
variables or negated variables) or the class of regular languages (with the added help of so-called 
“membership queries”), but there are also many C that are not efficiently learnable. 


19.2.2 Learning from quantum examples under the uniform distribution 


There are different ways to define learning from quantum data. One natural way, due to Bshouty and 
Jackson [71], is to replace each classical random example (x, f(x)), with x ~ D, by a superposition. 
Focusing on the typical case ¥ = {0,1}", a quantum example would be the (n + 1)-qubit state 


2, Velse: 


xE{0,1}” 


Of course, the world doesn’t usually present us with quantum examples, in contrast to the abun- 
dance of classical data for machine learning. So this model is only relevant in special cases, for 
example if we have a physical experiment producing such states. 

One thing we could do with a quantum example is measure it in the computational basis, but 
that would just give us back a classical example (x, f(x)) with z ~ D. A more clever thing we can 
do is Fourier sampling. Suppose D is the uniform distribution. Exercise 1 shows how to convert a 
quantum example (with probability 1/2) into an n-qubit state where the labels are +1-phases: 


If we apply n Hadamard gates to this state, then we get 


> 2 5 (—1)**(-1)f|s) = 5 as|s). 


s€{0,1}” xeE{0,1}” s€{0,1}” 
If we measure this state, then we’ll see outcome s € {0, 1}” with probability a2. The amplitudes as 
are called the Fourier coefficients of the function (—1)/“, whence the name “Fourier sampling.” 


In some cases Fourier sampling gives a lot of information about the f we’re trying to learn. 
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Learning linear functions. A perfect illustration of Fourier sampling is for the following class: 


nm 
C = { fa | a € {0,1}, Yz : falx) =a- x mod 2 = a mod 2}, 

i=1 
these are the linear functions modulo 2. It is easy to calculate that if we do Fourier sampling 
on a quantum example for function fa, then aq = 1 and as = 0 for all s Æ a. So one Fourier 
sample already tells us what a is! Hence we can learn fa exactly (i.e., with € = 0), with high 
probability, using O(1) examples and O(n) elementary gates. In contrast, learning linear functions 
from classical examples under the uniform distribution requires O(n) examples (see Exercise 2). 


Learning DNF. A richer concept class that can be learned efficiently from uniform quantum 
examples is the class of s-term Disjunctive Normal Form (DNF) formulas on n Boolean variables. 
These are formulas of the form f(x) = (a1 A 7273) V (£2 A z3 A z5), i.e., an OR of up to s different 
ANDs of variables or negations of variables. The concept class C of s-term DNF is not known to 
be efficiently PAC learnable w.r.t. the uniform distribution D classically. However, Bshouty and 
Jackson [71] showed that s-term DNF can be learned in polynomial time (in s and n) from uniform 
quantum examples. Roughly speaking, they use Fourier sampling to produce a linear function that 
is weakly correlated with the target DNF function f, and then use a classical “boosting” algorithm 
to combine multiple such weak hypotheses into one good hypothesis h. We’ll skip the details here. 


19.2.3 Learning from quantum examples under all distributions 


We saw a few cases where quantum examples reduce the sample and/or time complexity of learning 
algorithms w.r.t. a fixed data-generating distribution D, namely uniform D. But in the PAC model 
we ideally want a distribution-independent learner that works well for every possible distribution D. 
Can allowing quantum instead of classical examples significantly reduce the sample complexity of 
learning a class C in the distribution-independent setting? It turns out the answer is ‘no’. 

Classically, the number of examples that is necessary and sufficient for (€,6)-PAC learning a 
concept class C is known to be [60, 132] 


a -n : kecti 


(19.1) 
where VCdim(C) is the so-called VC-dimension of C, named after Vapnik and Chervonenkis [241] 
and defined as follows. We say that a set S C {0,1}” is shattered by C if for each of the 2191 
possible labelings  : S — {0,1}, there is a function f € C that has the same labeling of S' (i.e., 
fis = £). VCdim(C) is the size of a largest S shattered by C. Intuitively, larger VC-dimension 
corresponds to a more complex or “richer” (and hence harder to learn) concept class. We won’t 
prove the characterization of Eq. (19.1) here, but Exercises 4 and 5 go most of the way towards the 
claimed upper and lower bounds on m, respectively. 

It was proven in [29] that in fact the same formula Eq. (19.1) determines the number of quantum 
examples that are necessary and sufficient for learning C. The sufficiency is trivial: just measure the 
quantum examples and run the best classical PAC learner. The necessity was proved by reducing 
a quantum measurement problem to the problem of PAC learning C from quantum examples, and 
showing that the number of copies of the example-state required to solve that measurement problem 
is at least the expression of Eq. (19.1). So, up to constant factors, quantum examples are not more 
useful than classical examples for distribution-independent PAC learning. 
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19.2.4 Learning quantum states from classical data 


One can generalize PAC learning from Boolean-valued to real-valued target functions f : ¥ — [0,1], 
and then consider a hypothesis h : ¥ — [0,1] to be approximately correct (for some small y) if 


errpy(f;h) = Pr [lf(e) — h(a) >a < e. 


So now a good hypothesis h is supposed to be close to f (rather than equal) for most zx. 

An interesting example is the problem of learning an unknown n-qubit quantum state p from 
measurement data. Let X be the set of measurement elements, i.e., psd matrices M with ||M]| < 1. 
If we measure p with some POVM of which M is one element, then the probability to get the 
outcome corresponding to M, is Tr(Mp). Accordingly, we can define f : Æ — [0,1] as f(M) = 
Tr(Mp) and consider the class C of all such functions (one f for each possible p, so this class is 
uncountable). Aaronson [1] showed that this C is classically PAC learnable from O(n) examples 
of the form (a, f(x)) (with some polynomial dependence of the sample complexity on y,¢, and 
exponential time complexity). Note that we are not really learning p itself, but rather learn to 
predict the measurement probabilities. In contrast, learning a good approximation of p itself (with 
small error in trace distance) requires a number of copies of p that is exponential in n [126, 197]. 
Some positive results for learning specific classes of quantum states can be found in [180, 20, 161]. 


19.3 Unsupervised learning from quantum data 


In this section we will look at an example of unsupervised learning from quantum data: dimension- 
reduction via Principal Component Analysis. Suppose we are given m vectors v1,...,Um E€ R®%, say 
unit vectors for simplicity. Let’s say the dimension d of the data-vectors is very large, and we would 
like to reduce it to some much smaller k, say at most k = polylog(d). Many machine learning tasks, 
for example clustering, become much easier if we can significantly reduce this dimension. 

One way to achieve this dimension-reduction is to find k suitable unit vectors c,...,cr € RI 
(which may or may not be in the set {vj} themselves), such that the projection Psv; of the v;’s 
on the k-dimensional space S = span{c1,..., Ck} typically doesn’t lose much, i.e., Psv; is close to 
vi for most i € [m]. Then we can replace each v; by the k-dimensional vector Psv; = De QjCj, 
expressed as the vector of coefficients (aj) € R* (note that aj = (c;|v;)). How to find those k 
“directions”? One method that often (though not always) works well is to find the k eigenvectors 
corresponding to the k largest eigenvalues of the following d x d “correlation matrix”: 


m 
A= > vwt. 
i=1 


Those k eigenvectors are called the k “principal components” of A. They intuitively correspond to 
the k most important directions in the data, and we can choose them for dimension-reduction. 
Classically, we can find those k eigenvectors by diagonalizing A, which takes times polynomial 
in d. In the quantum case we can do something very different, under the (very strong) assumption 
that we can efficiently, say in time polylog(d), prepare the [log(d)|-qubit quantum states |v) 
corresponding to the vectors v;. By choosing i € |m] uniformly at random and preparing |v;), we 
prepare the following [log(d)|-qubit mixed state, which is proportional to the correlation matrix: 


1 1 
== > [vi vi] = — A. 
j= 
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Let’s say this has (unknown) spectral decomposition p = ae Ajle;)(cej| with A, > + > Aq > 0, 
where the first k eigenvalues sum to something close to 1, and are not too close together, at least 
1/poly(k) apart.4 We would now like to find the top-k eigenstates |c1),...,|cx) of this p. 

Note that the unitary U = e’? has the same eigenstates as p itself, with every eigenvalue À; of 
p translating into eigenvalue et^ of U. Lloyd et al. [174] (with more precise analysis and matching 
lower bound in [153]) showed that we can actually implement the power U* up to error € using 
O(t?/e) copies of the state p (see Exercise 7). We now use phase estimation with the unitary U 
on a copy of p itself, with additive error 6 = 1/poly(k). By Section 4.6, phase estimation with 
additive error ô corresponds to running controlled versions of U* for t up to O(1/6). Under our 
earlier assumptions, this only takes poly(k) = polylog(d) time. Ignoring for simplicity the small 
errors (< 6) that phase estimation makes in estimating the values Aj, phase estimation transforms 
the copy of p and a few auxiliary |0)-qubits into the state 


S > Agley) ey e AA. 


j=1 


If we measure the second register, then we obtain state |c;) & |A;} with probability A;.° Doing this 
poly(k) many times, we learn the k largest values \;,..., Ax, and for each of those A;’s we’ll have a 
number of copies of the eigenstate |c;). This is a quantum form of Principal Component Analysis. 

This collection of eigenstates determines a k-dimensional subspace on which we could re-express 
the v;’s (approximately), but it is not very explicit: we only have the k basis vectors of this space 
as quantum states! Suppose we want to express some unit vector v (which again we assume we can 
prepare efficiently as a state |v)) as a linear combination of the c;’s. One thing we can do is use a 
few copies of each |c;) to approximate |(c;|v)|? for each i using the SWAP-test, which gives us at 
least partial information about the coefficients (c;|v) (see Exercise 8). 

“Quantum PCA” has a lot of drawbacks, but at least it shows some genuinely quantum tricks 
that we can use under the assumption that our input vectors can be efficiently prepared as quantum 
states. There have also been some quantum approaches for the prominent unsupervised learning 
problem of clustering, but we will not describe those here (see for instance [173, 150]). 


19.4 Optimization 


In the previous two sections we assumed quantum data: either the data is already given as a 
superposition, or we can efficiently put given classical data in superposition. However, in most 
real-world applications of machine learning we have classical data without the means to efficiently 
make this quantum. Remembering the slogan ML = data + optimization, if there’s any room 
left for quantum improvements when data is classical, it would be in the optimization to find a 
well-fitting model for the data. We’ll look at some examples where quantum computing might help. 


“All these assumptions make this principal-component analysis a heuristic method for clustering, not something 
that provably always works. It is anyway not clear in this case what a correct or optimal output would be: because 
this is unlabeled data, we do not have a clear standard for correctness. 

5 Actually, in the second register we will see (with high probability) a -close approximation of \; rather than 
A; itself, but if we assume ô is much smaller than the spacing between the eigenvalues \1,...,Ax, then such a 
d-approximation is good enough to “recognize” the state in the first register as the jth principal component. 
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19.4.1 Variational quantum algorithms 


One approach that has received a lot of attention is to optimize over parametrized circuits. Suppose 
we have a quantum circuit U(@) with a vector 0 of parameters. This could for instance be a circuit 
where CNOTs and single-qubit rotations are already in place, but the angles of the single-qubit 
gates are parameters that we can tweak. This U(6) is then applied to a fixed starting state, say |0}, 
yielding a final state |w(@)) = U(@)|0). The goal is now to minimize the expected value of some 
observable M, i.e., to find a 0 to mimimize the function f(0) = (w(@)|M|v¥(@)). In the case of 
supervised learning applications, U(@) could for instance represent some hypothesis (i.e., a way to 
predict labels of x’s), M could incorporate the given labeled examples (x, f(x)), and f(@) could be 
the “empirical error”: the fraction of mis-predicted labels among the given examples. 

Note that f(@) can be computed approximately (for classically given 8) on a quantum computer 
by repeatedly preparing |7)(@)) and measuring the observable M. If the circuits U(@) are relatively 
simple (say, few qubits, few gates, low depth) and M is relatively easy to measure (say, a sum 
of a few n-qubit Pauli matrices with few non-identity terms) then this could already be done 
on a relatively small and simple quantum computer. Variational quantum algorithms (VQAs) 
are typically hybrid classical-quantum algorithms: the minimization over @ is usually done by a 
classical outer loop that iteratively improves 0. Using the ability to approximately compute f we 
can for instance try to do approximate gradient descent (move 6 by some step-size in the direction 
of steepest descent of f) or some other method. This is analogous to the iterative way the weights 
in neural networks are optimized, and these variational quantum approaches are sometimes (with 
a keen sense for marketing) called “quantum neural networks” or “quantum deep learning.” For 
combinatorial optimization, a very structured version of the variational approach is the Quantum 
Approximate Optimization Algorithm (QAOA) [109]. See [78] for a general overview of VQAs. 

One interesting application of this variational idea is in trying to find the smallest eigenvalue of a 
given Hamiltonian H. For example, H could describe the energy of a chemical system as a function 
of the locations of the particles (nuclei and electrons) of the system; the smallest eigenvalue of H 
would be the “ground-state energy” of the system, which is an important quantity in chemistry. We 
know from Chapter 14 that in general this problem of determining or even well-approximating this 
ground state energy is QMA-hard, even in the special case where H is a sum of 2-local terms, so in 
general this shouldn’t be efficiently solvable on a quantum computer. However, suppose that from 
some general physics or chemistry intuition we have a rough idea of what the ground state of our 
particular Hamiltonian H should look like, something we can prepare using a simple parametrized 
circuit U(@). The set of states |(@)) = U(@)|0) that we are limiting ourselves to, is called an 
“Ansatz” (German for “approach” or “attempt”). We can now try to optimize the parameters 0 
in order to minimize the expected value f(@) = (w(6)|H|v(@)), i.e., the energy of the state |w(6)). 
This approach is called the “variational quantum eigensolver” (VQE) [199], and is one of the best 
hopes for applying smallish, near-term quantum computers to problems in chemistry. 


19.4.2 Some provable quantum speed-ups for optimization 


The variational approach is rather heuristic: it very much depends on how good the “Ansatz” (the 
choice of the class of parametrized circuits U(@)) happens to be for the particular problem at hand. 
Here we mention some other approaches, which yield provable (albeit usually only polynomial) 
quantum speed-ups under some assumptions on how the input is given. 


e There are many quantum speed-ups for optimization problems on graphs, typically using 
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Grover search (Section 7.2), Grover-based minimum-finding (Exercise 7.10), amplitude am- 
plification (Section 7.3), or amplitude estimation (Exercise 7.8) as a subroutine. Examples 
are finding shortest paths [102] and approximating minimum cuts or graph sparsification [24]. 


e Solving linear systems and other basic linear algebra is ubiquitous in classical optimization 
algorithms. Since quantum states are vectors and quantum operations are matrices, one 
can try to improve such classical algorithms using quantum algorithms. Examples are phase 
estimation (Section 4.6), the block-encoding approach (Section 9.4 and [120, Section 3.2.4]), 
and HHL (Chapter 10). The trouble with this approach is that it often assumes the input 
is a quantum state (which is not always practical) and/or that it produces the output as a 
quantum state (which is not always useful). For example, HHL and quantum PCA have both 
features. See [2] for more discussion. 


One interesting application of “quantum linear algebra” (with classical inputs and outputs!) 
is the quantum recommendation system of Kerenidis and Prakash [151], which can generate 
recommendations of type “you might also like” to a user of systems like Amazon or Netflix, 
based on the user’s and other users’ earlier behavior. Initially [151] was believed to give 
an exponential speed-up over classical recommendation systems, until Tang showed how to 
“dequantize” their quantum algorithm under similar classical access assumptions [234, 81]. 


e In conver optimization we minimize a convex function f : R” — R, either over all x € R” 
or over all x that are constrained to lie in some convex domain ¥ C R”. This covers a big 
part of continuous optimization. Convexity ensures that the only local minima are also global 
minima, but such methods often still work to find good local minima for non-convex problems 
(such as training neural networks). Iterative first-order methods like gradient descent use 
the gradient of f at a given point, which in some cases can be computed more efficiently by 
quantum algorithms [146, 121, 95] (see Exercise 6). Second-order methods often solve a linear 
system involving the Hessian (the n x n matrix of partial second derivatives at a given point), 
and we can try to use quantum linear algebra. If the matrix is symmetric and diagonally 
dominant and the output needs to be classical, then we could use the linear solver of [24]. 


Quantum algorithms are known for the specific cases of linear programming (LPs) and 
semidefinite programming (SDPs) [64, 22, 63, 23, 21], for learning support vector machines 
(SVMs) [203, 221, 213, 9, 214], and for least-squares linear regression with an ¢,-regularizer [80]. 


Exercises 


1. Suppose that for some unknown Boolean function f : {0,1}" — {0,1} and amplitude-vector 
(Ax) xe{o,1}", you are given one copy of the (n + 1)-qubit state 


Y= ae|2)|f(2)). 
xeE{0,1}” 


Show how you can convert this into state 


> o%(—1)F|2)|1) 


xe {0,1}" 


with success probability 1/2, in such a way that you know when you succeeded. 
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. Consider again the concept class C of linear functions mod 2. 


(a) Give a classical learning algorithm to learn a linear function exactly with high success 
probability (e = 0,5 = 1/3) using O(n) uniform random examples and O(n?) time. 

(b) Argue that every classical PAC learner for C under uniform D, with € < 1/4, needs (Q(n) 
examples. 


. In the model of exact learning with membership queries, the goal is to exactly learn a target 
function f € C from queries to f (so there are no examples in this setting, or rather the 
learner can choose their own examples). 

Show that if C is the concept class of linear functions, then a target function f € C can be 
learned with 1 quantum membership query, but requires 2Q(n) classical membership queries. 


. Consider a concept class C of functions f : ¥ > {0,1}, with |¥| = N and VCdim(C) = d. 


(a) Consider the following simple (and probably not very time-efficient) learning algorithm: 


Draw m examples for target function f; output ah € C consistent with these examples. 


Let h € C be a function with errp(f,h) > e€. Show that at the end of this algorithm, 
the probability that h is still consistent with the m examples is < (1—«)”™. 

(b) Set m = [log(3|C|)/log(1/(1 — €))]. Show that with probability > 2/3, the only h that 
are consistent with the m examples have errp(f,h) < €. 

(c) Derive an upper bound m = O(dlog(N)/e) on the classical sample complexity of (e, 1/3)- 
PAC learning the class C using Sauer’s lemma, which says that |C| < Fia S). 


. Suppose the set S = {x1,..., £4} C X is shattered by concept class C. Consider a distribution 
D that puts 1 — 4e probability on x; and 4£/(d — 1) probability on each of x2,..., £a. 


(a) Let f € C be the target function. Show that you need Q((d— 1)/£) examples ~ D to see 
(with probability > 2/3) (x;, f(a;)) for at least 50% of the i € {2,..., d}. 


(b) Show that the sample complexity of every (£, 1/3)-PAC learner for the class C is at least 
Q((d—1)/e). 


. This exercise is about efficiently finding the gradient V f(z) of a function f : R? > R at 
a point z € R°. The gradient is the d-dimensional real vector of the d partial derivatives 
Of /Ox;, evaluated at the point z. 


(a) (H) Let f(x) = a+bz be a linear function from R to R, where the real number b € (0, 1) 
can be written with n bits of precision. Suppose we have a unitary Op that maps 
|x,0) — |z, f(a)) (assume we have enough qubits to write down x and f(x)). Give a 
quantum algorithm to compute b using one application of Oy and one application of 
OF", and some unitaries that do not depend on f. 

(b) Let f(£1,..., £a) = a + bızı +--+ + baxg be a linear function from R? to R, where 
a,b1,...,bq E R. Show that the gradient V f(z) is equal to (b,...,bg) for every z € R¢. 

(c) Assume that for the function f in (b), each coefficient bẹ is € [0,1) and can be written 
with n bits of precision. Suppose we have a unitary Oş that maps |x1,..., £a, 0) > 
\v1,..., a, f(@1,...,2a)). Give a quantum algorithm that computes the gradient V f(z) 
using one application of Of and Oe and some unitaries that do not depend on f. 
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Comment: The quantum algorithm of (c) is somewhat reminiscent of the Bernstein-Vazirani algorithm (Sec- 
tion 2.4.2), though that one is for functions over F rather than R°. Variants of the algorithm of (c) can also 
be applied to efficiently approximate the gradient of a sufficiently smooth non-linear function f : R¢ > R, 
since a smooth f can be well-approximated at a given point z € R® by a linear function whose coefficients are 
the entries of the gradient V f(z). 


7. (H) Let ø and p be k-qubit mixed states, € > 0 small, t > 0, and U = e’? be a unitary. Our 
goal in this exercise is to apply U‘ to ø (with error < € in trace norm) at the expense of using 
some copies of the state p. 


(a) Let V be the 2-qubit SWAP-gate (which maps |a)|b) — |b)|a) for all a,b € {0,1}). Show 
that V~2"/" = e~MeV for all n > 0. 

(b) Let W be the 2k-qubit unitary that swaps the first k qubits with the last k qubits, 
i.e., it maps |a)|b) — |b)|a) for all a,b € {0,1}*. Show that for all 7 > 0, eW" can be 
implemented with k 2-qubit gates. 

(c) Show that for small 7 > 0, U"0U~" = o + in(op — po) + E, where ||E||, = O(n”). 

(d) Let o’ be the k-qubit local state of the first register after applying the unitary eW” to 
the 2k-qubit state o & p. Show that o’ = o + in(op— po) + E’, where || E'l] = O(n). 

(e) Show that ||o’ -U"cU~"||, = O(n?). 

(f£) Show that you can implement U* on o with error € in trace norm, using O(t?/e) copies 
of p and O(kt?/e) elementary gates. 


8. Suppose |) and |Y} are unknown n-qubit pure states. 


(a) (H) Show how a quantum computer can estimate the overlap |(¢|y)| (in absolute value) 
up to additive error 1/100 using O(1) given copies of |¢) and |Y}, and O(n) elementary 
gates. 


(b) Assume the inner product (¢|q) is a real number. Show that |||¢) — I|? = 2—2(|y). 


(c) Assume (¢|w) is real and positive. Show how a quantum computer can estimate the 
distance |||¢) — |)|| up to additive error 1/100 using O(1) copies of |¢) and |Y}, and 


O(n) gates. 
(d) Can a quantum computer detect the difference between the two cases |y) = |¢) and 
Be i eee ; > . 
= ; 7 ; 
lY) |), given arbitrarily many copies of these two states? Explain your answer 
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Chapter 20 


Error-Correction and Fault-Tolerance 


20.1 Introduction 


When Shor’s algorithm had just appeared in 1994, most people (especially experimental physicists, 
who were very aware of the difficulties in manipulating subatomic particles) were extremely skeptical 
about the prospects of actually building a quantum computer. In their view, it would be impossible 
to avoid errors when manipulating small quantum systems, and such errors would very quickly 
overwhelm the computation, rendering it no more useful than classical computation. However, in 
the few years that followed, the theory of quantum error-correction and fault-tolerant computation 
was developed. This shows, roughly speaking, that if the error-rate per operation can be brought 
down to something reasonably small (say 1%), and the errors between different qubits are not 
very correlated, then we can actually do near-perfect quantum computing for as long as we want. 
Below we give a succinct and somewhat sketchy introduction to this important but complex area, 
just explaining the main ideas. See the surveys by Gottesman [123] and Terhal [235] for more (in 
particular the latter for the important “surface code,” which we won’t cover here). 


20.2 Classical error-correction 


In the early days of classical computing, errors were all over the place: memory-errors, errors in 
bits sent over a channel, incorrectly applied instructions, etc.! Nowadays hardware is much more 
reliable, but we also have much better “software solutions” for errors, in particular error-correcting 
codes. Such codes take a string of data and encode it in a larger string (the “codeword” ), adding 
a lot of redundancy so that a small fraction of errors on the codeword won’t be able to reduce the 
information about the encoded data. 

The simplest example is of course the repetition code. If we want to protect a bit b, we could 
repeat it three times: 

b ++ bbb. 


If we want to decode the encoded bit b from the (possibly corrupted) 3-bit codeword, we just take 
the majority value of the 3 bits. 

Consider a very simple noise model: every bit is flipped (independently of the other bits) with 
probability p. Then initially, before applying the code, b has probability p to be flipped. But if 


'The name “bugs” actually comes from insects getting stuck inside the computer and causing errors. 
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we apply the repetition code, the probability that the majority-value of the three bits is different 
from b, is the probability of 2 or 3 bitflips, which is 3p?(1 — p) + p? < 3p”. Hence the error-rate 
has been reduced from p to less than 3p?. If the initial error-rate pọ was < 1/3, then the new 
error-rate pı < 3p2 is less than po and we have made progress: the error-rate on the encoded bit 
is smaller than the error-rate on the unencoded bits. If we’d like it to be even smaller, we could 
concatenate the code with itself, i.e., repeat each of the three bits in the code three times, so the 
codelength becomes 9. This would give error-rate po = 3p?(1 — pı) + p? < 3p? < 27pj, giving a 
further improvement. As we can see, as long as the initial error-rate p was at most 1/3, we can 
reduce the error-rate to whatever we want: k levels of concatenation encode one “logical bit” into 
3" “physical bits,” but the error-rate for each logical bit has been reduced to 1 (3p)? 2 This is a 
very good thing: if the initial error is below 1/3, then k levels of concatenation increase the number 
of bits exponentially (in k) but reduce the error-rate double-exponentially fast! 

Typically, already a small choice of k gets the error-rate down to negligible levels. For example, 
suppose we want to protect some polynomial (in some n) number of bits for some polynomial 
number of time-steps, and our physical error-rate is some fixed po < 1/3. Choosing k = 2 log logn 
levels of concatenation already suffices for this, because then pz < 1(3p)?" m~ Q-(logn)? — p-logn 
goes to 0 faster than any polynomial. In that case, by the union bound, even the probability that 
there exists an error anywhere among our polynomially many logical bits in polynomially many 
time-steps, will be negligibly small. With this choice of k, each logical bit would be encoded in 
3° = (log n)?lea(3) physical bits, so we only increase the number of bits by a polylogarithmic factor. 


20.3 Quantum errors 


The need for error-correction is far greater for quantum computers than for classical computers, 
because “quantum hardware” is much more fragile than classical hardware. Unfortunately, error- 
correction is also substantially more difficult in the quantum world, for several reasons: 


e The classical solution of just repeating a state is not available in general in the quantum 
world, because of the no-cloning theorem. 


e The classical world has basically only bitflip-errors, while the quantum world is continuous 
and hence has infinitely many different possible errors. 


e Measurements that test whether a state is correct can collapse the state, losing information. 


Depending on the specific model of errors that one adopts, it is possible to deal with all of these 
issues. We will consider the following simple error model. Consider quantum circuits with S 
qubits, and T time-steps; in each time-step, several gates on disjoint sets of qubits may be applied 
in parallel. After each time-step, at each qubit, independently from the other qubits, some unitary 
error hits that qubit with probability p. Note that we assume the gates themselves to operate 
perfectly; this is just a convenient technical assumption, since a perfect gate followed by errors on 
its outgoing qubits is the same as an imperfect gate. 


?With a bit more work we can show that this even works if the initial error-rate p is only strictly less than 1/2 
rather than < 1/3. See Exercise 9. 
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Let’s investigate what kind of (unitary) errors we could get on one qubit. Consider the four 
Pauli matrices from Appendix A.9: 


(aS) xe(2 ihre ( i) A) 


These have an interpretation as possible errors: J corresponds to no-error, X is a bitflip-error, Z is 
a phaseflip-error, and Y = iX Z is a phaseflip-error followed by a bitflip-error (and a global phase 
of i, which doesn’t matter). These four matrices span the space of all possible 2 x 2 matrices, so 
every possible error-operation E on a qubit is some linear combination E = agi +a ,X +a2Y +a3Z 
of the 4 Pauli matrices. More generally, every 2" x 2* matrix can be written uniquely as a linear 
combinations of matrices that each are the tensor product of k Pauli matrices. 

Consider for example the error which puts a small phase ¢ on |1): 

E= ( A ) = etl? cos(4/2)I — ie'#!? sin(b/2)Z 
0 e? : 
Note that for small ¢ most of the weight in this linear combination sits on J, which corresponds to 
the fact that E is close to J. The sum of squared moduli of the two coefficients is 1 in this case. 
That’s not a coincidence: whenever we write a unitary as a linear combination of Pauli matrices, 
the sum of squares of the coefficients will be 1 (see Exercise 1). 

The fact that all 1-qubit errors are linear combinations of I, X,Y, Z, together with the linearity 
of quantum mechanics, implies that if we can correct bitflip-errors (X), phaseflip-errors (Z), and 
their product (Y), then we can correct all possible unitary errors on a qubit.? So typically, quantum 
error-correcting codes are designed to correct bitflip and phaseflip-errors (their product is then 
typically also correctable), and all other possible errors are then also handled without further work. 

Our noise model does not explicitly consider errors on multiple qubits that are not a product 
of errors on individual qubits. However, even such a joint error on, say, k qubits simultaneously 
can still be written as a linear combination of products of k Pauli matrices. So also here the main 
observation applies: if we can just correct bitflip and phaseflip-errors on individual qubits, then we 
can correct all possible errors! 


20.4 Quantum error-correcting codes 


Quantum error-correcting codes encode a number of “logical qubits” into a larger number of “phys- 
ical qubits,” in such a way that errors on some number of its qubits can be corrected. The first and 
simplest is Peter Shor’s 9-qubit code [227], which encodes 1 logical qubit into 9 physical qubits, 
and can correct an error on any one of the 9 physical qubits. Here are the codewords for the two 
logical basis states: 


1 


|0) + |0) = ga 000) + |111))(|000) + |111))(J000) + |111)) 
1 


|1) = |1) = we — |111))(]000) — ]111))(]000) — |111)) 


3We can even correct the non-unitary errors that arise from undesired interaction between qubits of our circuit 
with the environment, but we won’t talk about such errors here. 
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These two quantum codewords |0) and |1) span a 2-dimensional space {a|0) + 6|1)}. This 2- 
dimensional subspace of the overall 2°-dimensional space is called the “codespace.” 

Suppose an error happens on one of these 9 qubits. We would like to have a procedure that 
maps the resulting state back to the codespace. By linearity, it suffices if we can do this for the 
basis states |0) and |1). First consider bitflip and phaseflip-errors. 


Detecting a bitflip-error. If a bitflip-error occurs on one the first 3 qubits, we can detect its 
location by noting which of the 3 positions is the minority bit. We can do this for each of the 
three 3-qubit blocks. Hence there is a unitary that writes down in 4 auxiliary qubits (which are all 
initially |0)) a number ep € {0,1,...,9}. Here ep = 0 means that no bitflip-error was detected, and 
ep € {1,...,9} means that a bitflip-error was detected on qubit number ep. Note that we don’t 
specify what should happen if more than one bitflip-error occurred. 


Detecting a phaseflip-error. To detect a phaseflip-error, we can consider the relative phase 
for each of the three blocks |000) + |111), and if they are not all the same, unitarily write down 
in 2 more auxiliary qubits (again, initially |0)) a number ep € {0,1,2,3}. Here ep = 0 means that 
no phaseflip-error was detected, and ep € {1,2,3} means that a phaseflip-error was detected in the 
ep-th block.* 


Together the above two procedures form one unitary U (i.e., one circuit) that acts on 9+4+2 = 15 
qubits, and that “writes down” e, in 4 auxiliary qubits and ep in 2 auxiliary qubits. For example, 
suppose we have the state |0). If X; denotes a bitflip-error on the i-th qubit (i € [9]) and Z; denotes 
a phaseflip-error on the j-th qubit (let 7’ € [3] denote the number of the block in which qubit j 
lies). Then after these errors our state is X;Z;|0). After fresh auxiliary qubits |0“)|0?) are added, 
U maps 


X;Z;|0)|0*)|0?) + X;Z;|0)|2)|7’). 


Together, e, = i and ep = j’ form the “error syndrome”; this tells us which error occurred where. 
The error-correction procedure can now measure this syndrome in the computational basis, and 
take corrective action depending on the classical outcomes e, and ep: apply an X to qubit e, (or 
no X if e = 0), and apply a Z to one qubit in the ep-th block (or no Z if ep = 0). The case of a 
Y-error on the i-th qubit corresponds to the case where i = j (i.e., the i-th qubit is hit by both a 
phaseflip and a bitflip); our procedure still works in this case. Henge we can perfectly correct one 
Pauli-error on any one of the 9 codeword qubits. 

As we argued before, the ability to correct Pauli-errors suffices to correct all possible errors. 
Let’s see in more detail how this works. Consider for instance some 9-qubit unitary error Æ. Assume 
it can be decomposed as a linear combination of 9-qubit products of Paulis, each having at most 
one bitflip-error and one phaseflip-error: 


= (oot +S a, (Bor +37 825) 


j=l 


“Note that we are not discovering exactly on which of the 9 qubits the phaseflip-error happened (in contrast to 
the case of bitflips), but that’s OK: we can correct the phaseflip-error by applying a Z-gate to any one of the 3 qubits 
in the block where the affected qubit sits. 
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Suppose this error occurs on |0): 


9 
E\0) = (aol + 2 ai Xi) (Bol + 57/25) = $ aif; X:Z;(0), 


j=l i,j=0 


where we denote Xo = Yo = I. 
If we now add auxiliary qubits |0*)|0?) and apply the above unitary U, then we go into a 
superposition of error syndromes: 


9 
U(E @ I®*)|0)|0*)|0?) = XC a:b; XZ lila). 
i, j=0 


Measuring the 6 auxiliary qubits will now probabilistically give us one of the syndromes |)|7’), with 
i € {0,1,...,9} and 7’ € {0,1,2,3}, and it will collapse the state to 


XZ; li) lj"). 


In a way, this measurement of the syndrome “discretizes” the continuously many possible errors to 
the finite set of Pauli-errors. Once the syndrome has been measured, we can apply a corrective X 
and/or Z to the first 9 qubits to undo the specific error corresponding to the specific syndrome we 
got as outcome of our measurement. It is also possible that the measurement outcome is 04, 0?; in 
that case the state has collapsed to |0)|0*)|02), so the syndrome measurement itself already removed 
the error! 

So now we can correct an error on one qubit. To achieve this, however, we have substantially 
increased the number of locations where such an error could occur: the number of qubits has gone 
from 1 to 9 (even to 15 if we also count the 6 auxiliary qubits used for the syndrome measurements), 
and we need a number of time-steps to compute and measure the syndrome, and to correct a 
detected error. Hence this procedure only gains us something if the error-rate p is so small that 
the probability of 2 or more errors on the larger encoded system is smaller than the probability 
of 1 error in the unencoded qubit. We will get back to this issue below, when talking about the 
threshold theorem. Note also that each new application of the correction-procedure need a new, 
fresh 6-qubit register initialized to |0*)|0?). After one run of the error-correction procedure these 
auxiliary qubits will contain the measured error syndrome, and we can just discard this. In a way, 
error correction acts like a refrigerator: a fridge pumps heat out of its system and dumps it into the 
environment, and error-correction pumps noise out of its system and dumps it in the environment 
in the form of the discarded auxiliary qubits. 

The above 9-qubit code is just one example of a quantum error-correcting code. Better codes 
exist, and a lot of work has gone into simultaneously optimizing the different parameters: we want 
to encode a large number of logical qubits into a not-much-larger number of physical qubits, while 
being able to correct as many errors as possible. The shortest code that encodes one logical qubit 
and protects against one error, has five physical qubits. There are also “asymptotically good” 
quantum error-correcting codes; these encode k logical qubits into O(k) physical qubits and can 
correct errors on a constant fraction of the physical qubits (rather than just an error on one of the 
qubits). 
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20.5 Fault-tolerant quantum computation 


Encoding a quantum state in a quantum error-correcting code to protect it against noise is good, 
but not enough: we also need to be able to do operations on the encoded qubits (Hadamards, 
CNOTs, etc.). One way is to decode the logical qubits, do the operation on them, and then re- 
encode them. This, however, is a recipe for disaster: if an error occurs in the interval between 
the decoding and subsequent encoding, then we’re unprotected and we cannot detect (let alone 
undo) errors happening during that interval. Accordingly, we need to be able to do operations on 
the logical qubits while they are encoded. Additionally, we need operations for regular stages of 
error-correction, i.e., measuring the syndrome and then correcting errors based on the outcomes 
of those measurements. These operations may also introduce errors, and the big worry is that 
error-correction steps may themselves introduce more errors than they correct.” 

There is a 7-qubit code due to Steane [232] which is used often because it has some nice 
properties: a Hadamard on the logical qubit corresponds to H87 on the physical qubits, and a 
CNOT between two logical qubits corresponds to applying CNOTs between the 7 pairs of the two 
blocks of physical qubits (i.e., between the 1st qubit of one block and the 1st qubit of the other 
block, etc.). Such implementations are called transversal. Adding the T-gate (|b) ++ e"/4\b)) to H 
and CNOT would yield a gate-set that suffices for universal quantum computation. Unfortunately, 
implementing the T-gate fault-tolerantly takes a lot more work, and we won’t go into that here 
(see Exercise 7, though). 

When designing schemes for fault-tolerant computing, it is very important to ensure that errors 
do not spread too quickly. Consider for instance a logical CNOT: if its control-bit is erroneous but 
its target bit is not, then after doing the CNOT both bits will be erroneous. The trick is to keep 
the errors on the physical qubits under control in such a way that regular stages of error-correction 
don’t get overwhelmed by the errors. For example, suppose we have a code that is able to correct 
up to one error in each encoded block (logical qubit); then the implementation of a logical CNOT 
may convert two encoded blocks where only one physical qubit has an error, into two blocks each 
of which has a single error, but not to multiple errors within one block, because our code will be 
able to handle two blocks with one error each but not one block with two errors (this is why the 
transversal implementation of the CNOT for Steane’s code is nice). In addition, we need to be able 
to fault-tolerantly prepare states, and measure logical qubits in the computational basis. We won’t 
go into the many further details of fault-tolerant quantum computing here. 


20.6 Concatenated codes and the threshold theorem 


The idea to concatenate a code with itself, described at the end of Section 20.2 for classical codes, 
also applies to quantum codes as we will sketch now. Suppose we have some code that encodes 
one qubit into C qubits, suppose that it can correct one error on any one of its C qubits, and 
uses D time-steps per stage of error-correcting (each time-step may involve a number of elementary 
gates in parallel). Instead of only 1, we now have CD locations where an error could occur! 
Assuming error-rate p per-qubit-per-time-step, the probability for the code to fail on a specific 
logical qubit at a specific time (i.e., to have more than 1 physical error on its CD locations) is 
p =Y? (CP) p'(1—p)°?-*. If p is a sufficiently small constant, then this sum is dominated by 


= i=2 \ i 


5Tt’s like being inside a leaky boat on the open seas, using a leaky bucket to scoop out water all the time to prevent 
the boat from filling up with water and sinking. It’s doable, but not easy. 
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the term for i = 2, and we have p' ~ (CD)?p. Accordingly, if the initial error-rate p is below 
some magical constant ~ 1/(CD)?, then p' < p and hence each level of error-correction reduces the 
error-rate by a constant factor. 

More generally, suppose we concatenate this code k times with itself. Then every “logical 
qubit” gets encoded into C* qubits, but (by the same calculation as in Section 20.2) the error- 
rate for each logical qubit gets reduced to O((C.Dp)2"). Suppose we want to be able to “survive” 
T = poly(n) time-steps without any error on the logical qubits; that is what we would need to 
run an efficient quantum algorithm on faulty quantum hardware. Then it suffices if we reduce the 
error rate to < 1/T, for which k = O(log log T) levels of concatenation are enough. These layers 
of error-correction increase the number of qubits and the computation time by a factor which is 
exponential in k, but that is still only a polylogarithmic overhead, since 2098 !’sT) — (log TONS 

The above sketch (when implemented precisely) gives us the famous “threshold theorem” [7, 
159]: if the initial error-rate of the quantum hardware can be brought down below some magical 
constant (known as the “fault-tolerance threshold”), then we can use software-solutions like quan- 
tum error-correcting codes and fault-tolerant computing to ensure that we can quantum compute 
for long periods of time without serious errors. Much research has gone into finding the best value 
for this fault-tolerance threshold. The more efficient our basic quantum error-correcting codes are 
(i.e., the smaller C and D), the higher (= better) the value of the threshold is. Currently the 
best rigorous estimates for the threshold are around 0.1%, but there is numerical evidence that 
even a few percent might be tolerable. This is actually one of the most important results in the 
area of quantum computing, and is the main answer to the skeptics mentioned at the start of the 
chapter: as long as experimentalists manage to implement basic operations within a few percent of 
error in a scalable way, then we should be able to build large-scale quantum computers.” Currently 
there seems to be no fundamental reason why we cannot do this; it is, however, an extremely hard 
engineering problem. 


Exercises 
1. (H) Let E be an arbitrary 1-qubit unitary. We know that it can be written as 
E = aol +ayX +a2Y + Q3Z, 
for some complex coefficients a;. Show that S |a;|? = 1. 


2. (a) Write the 1-qubit Hadamard transform H as a linear combination of the four Pauli 
matrices. 


(b) Suppose an H-error happens on the first qubit of a|0) + 8/1) using the 9-qubit code. 
Give the various steps in the error-correction procedure that corrects this error. 


3. Give a quantum circuit for the encoding of Shor’s 9-qubit code, i.e., a circuit that maps 
|008) + |0) and |108) + |I). Explain why the circuit works. 


4. Shor’s 9-qubit code allows to correct a bit flip and/or a phase flip on one of its 9 qubits. 
Below we give a 4-qubit code which allows to detect a bitflip and/or a phaseflip. By this we 


°Recently it was shown that one can even bring the overhead down to O(1) [110]. 
"This is of course assuming our simple model of independent noise on each physical qubit is not too far off; if the 
noise can be correlated in devious ways it becomes much harder (though often still possible) to protect against. 
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mean that after the detection procedure we either have the original uncorrupted state back, 
or we know that an error occurred (though we do not know which one). The logical 0 and 1 
are encoded as: 


3(|00) + |11)) ® (100) + |12)) 
3(|00) — |11)) ® (100) — |12)) 


(a) Give a procedure (either as a circuit or as sufficiently-detailed pseudo-code) that detects 
a bitflip error on one of the 4 qubits of a|0) + 6/1). 


(b) Give a procedure (either as a circuit or as sufficiently-detailed pseudo-code) that detects 
a phaseflip error on one of the 4 qubits of a|0) + 61). 


(c) Does that mean that we can now detect any unitary 1-qubit error on one of the 4 qubits? 
Explain your answer. 


5. (H) Show that there cannot be a quantum code that encodes one logical qubit into 2k physical 
qubits while being able to correct errors on up to k of the physical qubits. 


6. Suppose we have a qubit whose density matrix is p = aol +a,X + a2Y + a3Z, where 
Q0, &1, &2, a3 are real coefficients and I, X,Y, Z are the Pauli matrices. 


(a) Show that ag = 1/2. 


(b) Depolarizing noise (of strength p € [0,1]) acts on a qubit as follows: with probability 
1 — p nothing happens to the qubit, and with probability p the qubit is replaced by the 
“completely mixed state” of a qubit, whose density matrix is 1/2. 

Show that depolarizing noise on the above qubit doesn’t change the coefficient ag, but 
shrinks each of a1, a2, 3 by a factor of 1 — p. 


0 eit /4 
gate, but for some reason we cannot. However, we have a second qubit available in state 


Ja (10) + e'7/411)), and we can apply a CNOT gate and an § = ( ; : ) gate. 


7. Suppose we have a qubit |¢) = a|0)+]1) to which we would like to apply a T = ( Sa ) 


(a) What state do we get if we apply a CNOT to the first and second qubit? 


(b) Suppose we measure the second qubit in the computational basis. What are the proba- 
bilities of outcomes 0 and 1, respectively? 


(c) Suppose the measurement yields 0. Show how we can get T|¢) in the first qubit. 
(d) Suppose the measurement yields 1. Show how we can get T|¢ġ) in the first qubit, up to 


an (irrelevant) global phase. 


Comment: This way of implementing the T-gate is very helpful in fault-tolerant computing, where often 
CNOT and S are easy to do on encoded states but T is not. What this exercise shows is that we can prepare 
(encodings of) the so-called “magic state” 35 (0) + e’”/*|1)) beforehand (offline, assuming we can store them 


until we need them), and use those to indirectly implement a T-gate using only CNOT and S-gates. 
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8. Consider a quantum-error correcting code that encodes k qubits (and n — k |0)s) into an 
n-qubit codeword state, via the unitary encoding map 


U : |x,0"-*) + |C(x)), where x € {0,1}", and |C(x)) need not be a basis state. 


A “weight-w Pauli error” is the tensor product of n Pauli matrices, of which at most w are 
not identity (e.g., something like X @1@Z@I@I if w = 2 and n = 5). Suppose that there is 
a unitary map S on 3n qubits that can identify every weight-w Pauli error E on a codeword, 
by writing the name of E (the “error syndrome,” which we can think of as a 2n-bit string 
” E”, for example writing 00 for J, 10 for X, 01 for Z, 11 for Y) in a second register that’s 
initially 0°”. In other words, for every x € {0,1}* and weight-w Pauli error E, this S maps 


S : (E|C(2)))|0") + (E|C(2))) E”). 


(a) Show that if x and y are k-bit strings, and E and F are weight-w Pauli errors, then the 
n-qubit states E|C(x)) and F|C(y)) are orthogonal unless both x = y and E = F. 


w 
(b) Prove the inequality 2* J ( > <2". 
1 
i=0 
Comment: This inequality implies a lower bound on the required number of qubits n, in terms of the 


number of encoded qubits k and the weight w of errors that you can correct, but you don’t need to 


derive that consequence. 


9. In this exercise we will see that the argument about concatenating the classical 3-bit code at 
the end of Section 20.2 still works if the initial error rate is p < 1/2 — € for some e€ > 0. 


(a) Define the function f as f(p) = 3p?(1 — p) + p’. Show that if p = 1/2 — e for some 
e € [0,1/6], then f(p) < 1/2 — (13/9)e. 


(b) Show that there is an m = O(log(1/e)) such that m levels of concatenation reduce the 
error tO Pm < 1/3. 


(c) Show that m + k levels of concatenation reduce the error to something exponentially 
small in 2", 


(d) How many bits are used to encode one logical bit in the scheme from (c)? 
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Appendix A 


Some Useful Linear Algebra 


In this appendix we quickly introduce the basic elements of linear algebra, most of which will be 
used somewhere or other in these notes. 


A.1 Vector spaces 


A vector space V over a field F is a set of objects (called vectors) satisfying that if v,w € V, then 
cu + dw E V for all c,d € F. In other words, V is closed under addition and scalar multiplication. 
A (linear) subspace W is a subset W C V which is itself a vector space (i.e., closed under addition 
and scalar multiplication). For example, V = C? is the d-dimensional complex vector space, which 
is the set of all column vectors of d complex numbers. The set W C V of vectors whose first two 
entries are 0 is a subspace of V. As another example, the set V = {0,1}? of d-bit vectors, with 
entrywise addition modulo 2, is a linear space. The field here is F2 = {0,1}. The set W C V of 
vectors whose first two entries are equal is a subspace of V. 

A set of vectors v1,...,Um € V is linearly independent if the only way to get }>;", ajv; equal 
to the zero-vector 0, is to set aj = -+> = am = 0. The span (over field F) of a set of vectors 
S = {v1,...,Um} C V is the set span(S$) of vectors that can be written as a linear combination 
D aivi (with coefficients a1,...,@m € F). A basis for V is a linearly independent set S of vectors 
such that span(S) = V. One can show that all bases of V have the same size; this size is called 
the dimension of V. If we fix an ordered basis S = (v1,...,Um), then every w € V can be written 
uniquely as a linear combination )>;", w;v;, and can also be written (in that basis) as (w1,...,Wm)- 
The support of such a w is the set {i | w; 4 0} of locations where w is nonzero. For example, the 
support of w = (1,5,0,4) is {1, 2, 4}. 


A.2 Matrices 


Matrices represent linear maps between two vector spaces with particular bases. We assume famil- 
iarity with the basic rules of matrix addition and multiplication. We use Aj; for the (i, j)-entry of 
a matrix A and AT for its transpose, which has Aj, = Aji. We use Ig to denote the d x d identity 
matrix, which has 1s on its diagonal and Os elsewhere; we usually omit the subscript d when the 
dimension is clear from context. If A is square and there is a matrix B such that AB = BA =I, 
then we use A~! to denote this B, which is called the inverse of A (and is unique if it exists). Note 
that (AB)! = B71A!. 
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In the remainder of this appendix we will mostly consider the complex field. If A is a matrix 
(not necessarily square), then A* denotes its conjugate transpose (or adjoint): the matrix obtained 
by transposing A and taking the complex conjugates of all entries. Note that (AB)* = B*A*. 
Physicists usually write A’ (pronounced “A-dagger”) instead of A*, but in these notes we will stick 
with the A* notation that is common in mathematics. 


A.3 Inner product 


For vectors v, w, we use (vw) = v*w = >>, usw; for their inner product.| The combination of the 
vector space V with this inner product is called a Hilbert space. Two vectors v,w are orthogonal 
if (vjw) = 0. A set {v;} of vectors is called orthogonal if all vectors are pairwise orthogonal: 
(ujlvj) = 0 if i A j. If additionally the vectors all have norm 1, then the set is called orthonormal. 

The inner product induces a vector norm |lv|| = (jv) = yX; |vil?. This is the usual 
Euclidean norm (or “length”). The norm in turn induces a distance ||v — w|| between vectors v 


and w. Note that distance and inner product are closely related: 


lv = wl? = (v — wiv — w) = [lol]? + lwl? — wlw) — (wl). 


In particular, for unit vectors v and w the real part of their inner product equals 1 — $||v — wl’. 
Hence unit vectors that are close together have an inner product close to 1, and vice versa. The 
Cauchy-Schwarz inequality gives |(v|w)| < |u|] - ||w|| (see also Appendix B). 

The outer product of v and w is the matrix vw*. 


A.4 Unitary matrices 


Below we will restrict attention to square matrices, unless explicitly mentioned otherwise. 
A matrix A is unitary if A~! = A*. The following conditions are equivalent: 


1. A is unitary 
2. A preserves inner product: (Av|Aw) = (v|w) for all v, w 


3. A preserves norm: ||Av|| = ||v|| for all v 


Aa 


_ || Aol] = 1 if jo] = 1 


(1) implies (2) because if A is unitary then A*A = I, and hence (Av|Aw) = (v*.A*)Aw = (v|w). (2) 
implies (1) as follows: if A is not unitary then A*A Æ J, so then there is a w such that A*Aw 4 w 
and, hence, a v such that (v|w) 4 (v|A*Aw) = (Av|Aw), contradicting (2). Clearly (2) implies (3). 
Moreover, it is easy to show that (3) implies (2) using the following identity: 


2 2 2 
lv + wll” = loll! + oll” + wlw) + wo). 


The equivalence of (3) and (4) is obvious. Note that by (4), the eigenvalues of a unitary matrix 
have absolute value 1. 


‘Here we follow a physics convention: mathematicians usually define (v|w) = vw”. 
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A.5 Diagonalization and singular values 


The complex number A is an eigenvalue of (square) matrix A if there is some nonzero vector v 
(called an eigenvector) such that Av = Av. 

Matrices A and B are similar if there is an invertible matrix S such that A = SBS~!. Note that 
if Av = Av, then BS~!v = AS~!v, so similar matrices have the same eigenvalues. Schur’s lemma 
states that every matrix A is similar to an upper triangular matrix: A = UTU~! for some unitary 
U and upper triangular T. Since similar matrices have the same eigenvalues and the eigenvalues of 
an upper triangular matrix are exactly its diagonal entries, the eigenvalues of A form the diagonal 
of T. 

A matrix D is diagonal if Dj; = 0 whenever i 4 j. Let S be some matrix satisfying AS = SD 
for some diagonal matrix D. Let v; be the i-th column of S and A; be the i-th entry on the diagonal 
of D, then 


Avy = Ava |=] Avr © Aava |, 
AS SD 
and we see that v; is an eigenvector of A associated with eigenvalue A;. Conversely, if v1,...,vq are 
eigenvectors of A with eigenvalues \1,..., Ag, then we have AS = SD, where S has the v; as columns 


and D is the diagonal matrix of 4;. We call a square matrix A diagonalizable if it is similar to some 
diagonal matrix D: A = SDS~!. This D then has A’s eigenvalues À; on its diagonal, some of which 
may be zero. Note that A is diagonalizable iff it has a linearly independent set of d eigenvectors. 
These eigenvectors will form the columns of S, giving AS = SD, and linear independence ensures 
that S has an inverse, giving A = SDS~!. A matrix A is unitarily diagonalizable iff it can be 
diagonalized via a unitary matrix U: A = UDU™". If the columns of U are the vectors u;, and the 
diagonal entries of D are A;, then we can also write A = `; Aju;už; this is sometimes called the 
spectral decomposition of A. By the same argument as before, A will be unitarily diagonalizable iff 
it has an orthonormal set of d eigenvectors. 

A matrix A is normal if it commutes with its conjugate transpose (A*A = AA*). For example, 
unitary matrices are normal. If A is normal and A = UTU"! for some upper triangular T (which 
must exist because of Schur’s lemma), then T = U~! AU and T* = U~!A*U, so TT* = U-!AA*U = 
U-!A*AU = T*T. Hence T is normal and upper triangular, which implies (with a little work) that 
T is diagonal. This shows that normal matrices are unitarily diagonalizable. Conversely, if A is 
diagonalizable as UDU~!, then AA* = UDD*U* = UD* DUT! = A*A, so then A is normal. Thus 
a matrix is normal iff it is unitarily diagonalizable. If A is not normal, it may still be diagonalizable 
via a non-unitary S, for example: 


oie Wa ESE A SS i 1 0 1 -1 
02/7 \0 1 0 2 0 d1/)° 
eee eee ee C—C 
A S D S=. 
If A = UDU™! then A* = UD*U™—!, so the eigenvalues of A* are the complex conjugates of the 
eigenvalues of A. 
An important class of normal (and hence unitarily diagonalizable) matrices are the Hermitian 


matrices, which are the ones satisfying A = A*. Note that the last line of the previous paragraph 
implies that the eigenvalues of Hermitian matrices are real. 
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A Hermitian matrix A is called positive semidefinite (resp. positive definite), if all its eigenvalues 

are nonnegative (resp. positive). An equivalent definition is that A is positive semidefinite (psd) iff 
there exists a matrix C such that A = C*C (in other words, there exist vectors c; such that for all 
i,j, we have Aj; = (c;\cj)). A useful and easy to prove property is that A is psd iff Tr(AB) > 0 for 
all psd matrices B. By defining A > B iff A— B is positive semidefinite, we obtain a partial ordering 
on the set of all Hermitian matrices. If all eigenvalues are 0 or 1, then A is called a projection (or 
projection matrix or projector). This is equivalent to requiring A? = A. 
r ; ) is not. However, every matrix 
has a singular value decomposition (SVD), which is almost as useful as a diagonalization. We 
will derive this below for an invertible square matrix A. Since A*A is psd, we can write A*A = 
VDV~—! for some unitary V whose columns v; are the orthonormal eigenvectors of A*A, and some 
nonnegative diagonal matrix D with the corresponding eigenvalues. The entries g; of the matrix X = 
VD are called the singular values of A (some of which may be zero). Define vectors u; = Av;/o;. 
Note that the u; form an orthonormal system because usu; = v7 (A*Av;)/oio; = Uj (0505) /o103, 
which is 1 if i = j and 0 if i Æ j. Hence the matrix U that has these u;’s as columns is unitary. 
We have U = AVET}, so we can write A = USV~!, which is the SVD of A. Equivalently, we can 
write A = >, o;u;v%. This derivation of the SVD A = UXV~! can easily be extended to arbitrary 
m x n matrices A; U will be an m x m unitary, © will be m x n (padded with Os if the rank of A 
is < m,n), and V will be an n x n unitary. 


Not all matrices are diagonalizable, for instance A = 


A.6 Tensor products 


If A = (Aij) is an mxn matrix and B an m’ xn’ matrix, then their tensor product (a.k.a. Kronecker 
product) is the mm’ x nn’ matrix 


A,B Fre Aji, B 
AaB -+ An B 
A8 B= 
AmB > Amn B 
For example: 
1 1 
0 Z 0 J 
k 4 1 1 0 1 0 
TE \ eu» 8 ill a2 V3 

+ -+ -1 0 0 4 0 -4 
v2 v2 s v2 v2 
= 0 Z 0 


Note that the tensor product of two numbers (i.e., 1 x 1 matrices) is itself just a number, and the 
tensor product of two column vectors is itself a column vector. 
The following properties of the tensor product are easily verified: 


e c(A B) = (cA) ® B = A® (cB) for all scalars c 


e (A®B)* = A* & B*, and similarly for inverse and transpose (note that the order of the tensor 
factors doesn’t change). 


e A®(B+C) =(A@B)+(A@C) 
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+e A@(B@C)=(A@B)@Cc 
e (42 B)(C® D) = (AC) @ (BD) 


Different vector spaces can also be combined using tensor products. If V and V’ are vector spaces of 
dimension d and d’ with basis {v1,...,vg} and {v,...,v/,}, respectively, then their tensor product 
space is the d- d'-dimensional space W = V @ V’ spanned by {v; ® v; PU se Anat <j <d} 
Applying a linear operation A to V and B to V’ corresponds to applying the tensor product A & B 
to the tensor product space W. 


A.7 Trace 


The trace of a matrix A is the sum of its diagonal entries: Tr(A) = }), Ai. Some important and 
easily verified properties of Tr(A) are: 


e Tr(A+ B) = Tr(A) + Tr(B) 


e Tr(AB) = Tr(BA), which is known as the “cyclic property” of the trace. 
For example, Tr(Avv*) = v* Av. 


e Tr(A) is the sum of the eigenvalues of A. 
(This follows from Schur and the previous item: Tr(A) = Tr(UTU~!) = Tr(U~!UT) = 


T(T) = Oy) 


e Tr(A ® B) = Tr(A)Tr(B) 


A.8& Rank 


The rank of a matrix A (over a field F) is the size of a largest linearly independent set of rows 
of A (linear independence taken over F). Unless mentioned otherwise, we take F to be the field 
of complex numbers. We say that A has full rank if its rank equals its dimension. The following 
properties are all easy to show: 


e rank(A) = rank(A*) 


e rank(A) equals the number of nonzero eigenvalues of A (counting multiplicity) 


rank(A + B) < rank(A) + rank(B) 


e rank(AB) < min{rank(A), rank(B)} 


rank(A & B) = rank(A) - rank(B) 
e A has an inverse iff A has full rank 
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A.9 The Pauli matrices 


The four Pauli matrices are: 


1 0 0 1 0 —i 1 0 
Pl e oe a aa 


Note that each Pauli matrix P is both unitary and Hermitian, and hence self-inverse: P~! = P. 
This implies that their eigenvalues are in {—1,1}. Non-identity Paulis anti-commute: if P,Q € 
{X,Y, Z} are distinct then PQ = —QP. Note that Y = iX Z. Also, products of two distinct Pauli 
matrices have trace 0. 

Define the Hilbert-Schmidt inner product on the space of d x d matrices as (A, B) = 4Tr(A*B). 
With respect to this inner product (for d = 2), the four Pauli matrices form an orthonormal set. 
This implies that every complex 2 x 2 matrix A can be written as a linear combination of the Pauli 
matrices: 

A= aol + aX + a oY + a3Z, 


with complex coefficients a;. If A is Hermitian, then these coefficients will be real. 

We can also consider the n-qubit Paulis, which are n-fold tensor products of the above 2 x 2 
Paulis. For example X ® Z@®I@Y ® Z is a 5-qubit Pauli. There are 4” n-qubit Paulis, since we 
have 4 possibilities for each of the n tensor factors, and these 4” matrices form an orthonormal set 
w.r.t. Hilbert-Schmidt inner product. Accordingly, every 2” x 2” matrix A can be written uniquely 
as a linear combination of the 4” n-qubit Paulis. Again, if A is Hermitian, then the 4” coefficients 
will be real. 


A.10 Dirac notation 


Physicists often write their linear algebra in Dirac notation, and we will follow that custom for 
denoting quantum states. In this notation we write |v) = v and (v| = v*. The first is called a ket, 
the second a bra. Some points about this notation: 


(vlw) = (v||w): inner products are bra-ket (“bracket”) products. 


e If matrix A is unitarily diagonalizable, then A = 5°, ;|v;) (v;| for some orthonormal set of 
eigenvectors {vj}. 


e |v) (v| @ |w)(w] = (|v) @ |w))((v| S (w|), the latter is often abbreviated to |v) 8 |w)(v| & lwl]. 
Abbreviating the latter further by omitting the tensor product leads to dangerous ambiguity, 
though sometimes it’s still clear from context. 


e (Ulv))* = (v|U* and (|u) ® |v))* = (u| & v| (the ordering of tensor factors doesn’t change). 


e Don’t write kets inside of other kets or bras: the notation (u|(a|v) + 6|w))) doesn’t really 
make sense. 
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Appendix B 


Some other Useful Math and CS 


Here we collect various basic but useful facts and definitions needed in parts of the lecture notes. 


B.1 Some notation, equalities and inequalities 


We use [n] to denote the set {1,...,n}, and da, € {0,1} to indicate whether a = b or not. 
If P is a statement which can be true or false, then [P] € {0,1} denotes its truth value. 
Logarithms will always be to base 2 unless stated otherwise. 


A complex number is of the form c = a + bi, where a,b € R, and 7 is the imaginary unit, 
which satisfies i? = —1. Such a c can also be written as c = ret? where r = |c| = Va? + 02 
is the magnitude (a.k.a. modulus or norm) of c, and @ € [0,27) is the angle that c makes 
with the positive horizontal axis when we view it as a point (a,b) in the plane. Note that 
complex numbers of magnitude 1 lie on the unit circle in this plane. We can also write those 
as ef? = cos(¢) + isin(¢). The complex conjugate c* is a — ib, equivalently c* = re~"®. 


The Cauchy-Schwarz inequality: for a = (a,...,@,) € C” and b = (bj,..., bn) € C” 


n 
ò a; b; 
i=1 


n n 
A AD e 
i=1 i=1 


Equivalently, written in terms of inner products and norms of vectors: |(aļb}| < |lal| - bll. 
Proof for the case with real entries: for every real \ we have 0 < (a — Abla — Ab) = |lal|? + à?ljbl|? — 2A (alb). 
Now set A = |la||/||b|| and rearrange (a slightly more complicated proof works if a,b € C”). 


m—1 : 
: F m ifz=1 
Geometric sum: J z2? = 4 jm 


FÓ Was if z Æ 1 
Proof: The case z = 1 is obvious; for the case z Æ 1, observe (1— z) O zi) = ry = ey gaa 1—2™, 
For example, if z = e2/" is a root of unity, with r an integer in {1,...,N — 1}, then 


N-1 = J—e2tr Fan 
yD z — {—e2tir/N a 0. 


The ratio in the previous line can be rewritten using the identity |1 — e| = 2|sin(6/2)|; 
this identity can be seen by drawing the numbers 1 and e” as vectors from the origin in the 
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complex plane, and dividing their angle 0 in two. Some other useful trigonometric identities: 
cos(0)? + sin(@)? = 1, sin(20) = 2sin(6) cos(6). 


e 1+ <2 < e” for all real numbers x (positive as well as negative). 
k k g 
e If £; € [0,1] then 1 — ba < [[a —e)< e7 L5=1 85, 
j=l j=1 


Proof: The upper bound comes from the preceding item. The lower bound follows easily by induction on k, 


using the fact that (1 — €1)(1 — £2) = 1 — £1 — E2 + e1€2 > 1 — £1 —€2. 


B.2 Algorithms and probabilities 


e When we do not care about constant factors, we’ll often use big-Oh notation: T(n) = O(f(n)) 
means there exist constants c, no > 0 such that for all integers n > no, we have T(n) < cf (n). 
Similarly, big-Omega notation is used for lower bounds: T(n) = Q(f(n)) means there exist 
constants c, no > 0 such that T(n) > cf(n) for all n > no. T(n) = O(f(n)) means that 
simultaneously T(n) = O(f(n)) and T(n) = Q(f(n)). Such notation is often used to write 
upper and/or lower bounds on the running time of algorithms as a function of their input 
length n. 


e For N = 2”, we can identify the integers {0,..., N—1} with their n-bit binary representations 
as follows: the bitstring £ = %p_1...2%1%9 E {0,1}”" corresponds to the integer Se 142". 
The leftmost bit x,_1 is called the most significant bit of x (since it corresponds to the largest 
power of two, 2”~'), and the rightmost bit xo is its least significant bit (it corresponds to 
2° = 1, so determines whether x is an even or odd integer). For example, if n = 3 then the 
bitstring x = ©2%12%9 = 101 corresponds to the integer xə- 4+ zı -2 +zo:1=4+1=5. 
The integer 0 corresponds to the bitstring 0” (if we use 0 to denote a bitstring of Os, then the 
value of n should be clear from context). 


We can also use binary notation for non-integral numbers, with the bits to the right of the 
decimal dot corresponding to negative powers of two (1/2, 1/4, 1/8, etc.). For example, 
0.1 denotes 1/2 and 10.101 denotes 2 + 1/2 + 1/8 = 21/8. Note that multiplying by two 
corresponds to shifting the dot to the right, and dividing corresponds to shifting to the left. 


e The union bound says that the probability of the union (or logical “or”) of two events is at 
most the sum of their probabilities: Pr[A V B] < Pr|A] + Pr|B]. This inequality should be 
obvious from drawing a Venn diagram. More generally, if we have T events Ai,..., Ar, then 
Pr[Ay V V Az] < YL Pr[Ai). 


e A (discrete) random variable X is an object that takes value x; with probability p;. Its 
expected value (or expectation) is u = E[X] = >>, pixi. Its variance is o? = Var[X] = 
E[(X — E[X])?] = E[X?] — E[X]?. Its standard deviation is o. 


e Linearity of expectation says that, for values a1,...,am and random variables X1,..., Xm, 
we have E[}¥" aj Xj] = Xj- ajE[X;] (which is easy to verify). 


e Random variable X is independent from random variable Y, if the value of Y does not affect 
the probability distribution of X, i.e., Pr|X = x] = Pr[X = z | Y = y] for all possible values 
x,y. If X and Y are independent, then E[X - Y] = E[X]- E[Y]. 
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e Three basic upper bounds on the tails of probability distributions: 


Markov: if X is a nonnegative random variable with expectation u, then Pr[X > ky] < 1/k. 
Proof: Since X is nonnegative, u > Pr[X > ky] - ky. 

Chebyshev: if X is a random variable with expectation u and standard deviation ø, then 
Pr[|X — u| > ko] < 1/k?. 

Proof: Apply Markov to the random variable |X — u|?, whose expectation is g”. 

Chernoff/Hoeffding: if X = )7"_, X; is the sum of n independent, identically distributed 
random variables X; € {0,1}, each with expectation Pr|X; = 1] = p, then X has expectation 
u = np, and exponentially decreasing tail bound Pr||X — u| > an] < Je~20?n 

Proof idea: For all parameters \, we have Pr[X — u > t] = Pr[e* > e+]. Upper bound the latter 
probability by applying Markov to the random variable e**. This is a product of n independent random 


x 


variables e**?, so its expectation is easy to analyze. Then choose \ to minimize the upper bound. 


e A randomized algorithm is a classical algorithm that can flip random coins during its op- 
eration, meaning its behavior is partially determined by chance and its output is a random 
variable that. depends on its input, rather than a deterministic function of its input. One can 
think of a randomized algorithm as a probability distribution over deterministic algorithms 
(one deterministic algorithm for each setting of the coins). 


e When we say a (randomized or quantum) algorithm has error probability < 1/3, this typically 
means in the worst case: for every possible input, the algorithm produces the correct answer 
with probability > 2/3, where probability is taken over the random coin flips and/or quantum 
measurements during its operation. Such statements do not refer to “most” inputs under some 
distribution unless stated explicitly. 


e If a (randomized or quantum) algorithm produces the correct answer in expected running 
time T (meaning for each input its expected running time is < T), then we can convert that 
into an algorithm with worst-case running time 3T and error probability < 1/3, as follows. 
Run the original algorithm for 3T steps, and just cut it off if it hasn’t terminated by itself. 
The probability of non-termination within 3T steps is at most 1/3 by Markov’s inequality. 
Hence with probability > 2/3 we will have the correct answer. 


e The union bound is very useful for analyzing an algorithm with T randomized subroutines, 
each of which has its own small failure probability < 1/(3T). If these failure events are inde- 
pendent, then the probability that none of the T subroutines fails is at least (1 —1/(3T))? > 
1 — T/(3T) = 2/3 (the inequality uses the last bullet of B.1). But what if these events are 
dependent on each other, which may well happen if the subroutines depend on what happens 
elsewhere in the algorithm? Then by the union bound the probability that there is at least 
one subroutine that fails is at most T - 1/(3T) < 1/3. In other words, the probability that 
none of the T subroutines fails is still > 2/3. 


e If a (classical or quantum) algorithm with 0/1-outputs has error probability < 1/3, then 
we can cheaply reduce this error probability to small € > 0, as follows. Choose odd n = 
O(log(1/e)) such that Qe~20°'" < e fora = 1/6. Run the original algorithm n times and 
output the majority among the n output bits. The probability that this majority is wrong 
(i.e., that the number of correct output bits is more than an below its expectation), is at 
most £ by the Chernoff bound. Hence we output the correct answer with probability > 1— e. 
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Appendix C 


Hints for Exercises 


Chapter 1 


9. Find the maximal p such that I — 2p|—)(—| — 2p|1)(1| is still psd. 

10. Consider what U has to do when |¢) = |0), when |¢) = |1), and when |@) is a superposition of 
these two. 

13.b. Use the facts that Tr(D]|w) (w]) = (~|D|w) and that products of 2 distinct Paulis have trace 0. 
This exercise is just superdense coding in disguise. 


Chapter 2 


6. Use Exercise 5. 

7. Instead of measuring the qubit, apply a CNOT that “copies” it to a new |0)-qubit, which is then 
left alone until the end of the computation. Analyze what happens. 

13. Use the Bernstein-Vazirani algorithm. 


Chapter 3 


6.c. Approximate the state of part (a) using the subroutine of part (b), and see what happens if 


N/24+2/N 


you apply Hadamards to the approximate state. Use the fact that xv Sear CG ) is nearly 1, 
because this is the probability that if you flip N fair coins, N/2 — 2V N or more of them come up 


“heads.” 


Chapter 4 


3. Use |a? — 6?| = |a; — Pil- [ai + £il and the Cauchy-Schwarz inequality. 

4.e. Use triangle inequality. 

4.f. Drop all phase-gates with small angles ¢ < 1/n? from the O(n?)-gate circuit for Fon explained 
in Section 4.5. Calculate how many gates are left in the circuit, and analyze the distance between 
the unitaries corresponding to the new circuit and the original circuit. 


Chapter 5 


l.a. You may invoke here (without proof) the Schénhage-Strassen algorithm for fast multiplica- 
tion [218, 160]. This allows you to multiply two n-bit integers mod N using O(n log(n) log log(n)) 
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steps (where n = [log N]).! 

3.a. The prime number theorem implies that Q(N/In N) of the numbers between 1 and N are 
prime; also there is an efficient classical algorithm to test if a given number is prime [5]. You may 
use these facts, but be explicit in how many bits your primes p and q will have. 

3.b. Use the result of Exercise 1 (no need to rederive that here). 

3.c. The set of all possible messages forms a group of size ¢(N). Euler’s Theorem says that in any 
group G, we have al@l = 1 for all a € G (here ‘1’ is the identity element in the group). 


Chapter 6 


4.b. For M = D E;, show that ||M||? = || M?]| < ŝI|M]| + a small constant. 
5. You could use the SWAP-test from Section 16.6. 


Chapter 7 


4.b. Recall that if there are i > 0 solutions, then one variant of Grover’s algorithm finds a solution 
using an expected number of O(,/N/i) queries. 

5.e. Choose y in (d) such that applying [k] rounds of amplitude amplification to A results in a 
solution for y with probability 1. 

6.a. Try running the exact version of Grover (see end of Section 7.2) with different guesses for what 
the actual ¢ is. 

7.a. Run the basic Grover search with a cleverly chosen number of iterations. 

7.b. Use binary search on top of (a). 

8.d. The eigenvalues of a 2-dimensional rotation matrix over angle À are e and e you don’t need 
to prove this). You may also assume a < 1, so that the differences between vya, sin v'a, arcsin \/a 
are negligible. You may refer to the lecture notes for phase estimation without further proof, incl. 
the fact that phase estimation gives a good n-bit approximation with high probability even if the 
phase cannot be represented exactly with n bits of precision. 

8.e. Define an A that involves one query to x and where a = t/N. Invoke (d) with € proportional 
to 1/V N. You can use |@—a| = |vă — valļ -|Va+ Val in your analysis of the approximation error. 
9.b. Combine amplitude amplification with the algorithm of (a), for a smart choice of k. Your 
answer may refer to the lecture notes for the details of amplitude amplification. 

10. Start with m = x; for a random i, and repeatedly use Grover’s algorithm to find an index j 
such that x; < m and update m = xj. Continue this until you can find no element smaller than m, 
and analyze the number of queries of this algorithm. You are allowed to argue about this algorithm 
on a high level (i.e., things like “use Grover to search for a j such that...” are OK), no need to 
write out complete circuits. You do, however, have to take into account that the various runs of 
Grover each have their own error probability 

11.b. What is the probability in (a) if you set s to roughly VN? 

11.c. Choose a set S' of size s = O(N 1/ 3), and classically query all its elements. First check if S 
contains a collision. If yes, then you’re done. If not, then use Grover to find a j ¢ S that collides 
with an i € S. 


—ir ( 


Shor used the Schénhage-Strassen algorithm in his original paper. We could also invoke the more recent improve- 
ment of Harvey and van der Hoeven [135], who remove the log log n factor. 
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Chapter 8 


4.a. Choose a uniformly random vector v € {0,1}", calculate ABv and Cv, and check whether 
these two vectors are the same. 

4.b. Consider the case where A is the all-0 matrix. 

4.c. Modify the algorithm for collision-finding: use a quantum walk on the Johnson graph J(n,r), 
where each vertex corresponds to a set R C [n], and that vertex is marked if there are i, 7 € R such 
that (AB); j A Ci j. Optimize over r. 

5.b. There’s no need to use the C, U, S-framework of the chapter here; the answer is much simpler. 
View the 3n-step random walk algorithm as a deterministic algorithm with an additional input 
r € {0,1}" x {1,2,3}8”, where the first n bits determine z, and the last 3n entries determine which 
variable of the leftmost false clauses will be flipped in the 3n steps of the random walk. Use Grover 
search on the space of all possible r, or amplitude amplification (no need to write out complete 
circuits here). 


Chapter 9 


3. Use induction on m, and the fact that there exists a constant c such that for A, B of small norm 
we have e4+® = e4e® + E for some E of norm ||E|| < c|| Al] - || Bl]. 

6. Calculate the subnormalized second-register state ((0| @ I)(W~! @ DV (W @ DIO Y). 

8.d. Like in the analysis of Grover’s algorithm and regular amplitude amplification (Chapter 7), 
the product of two reflections on S is a rotation of S. 

9. Use triangle inequality, ||H|| < 1, and the fact that k! > (k/e)*. 

10.b. W2 just implements a rotation on the first qubit, by an angle that depends on Ajj. If you 
have a basis state |0)|a) where a € [0,1] is some real number written in some fixed finite number 
of bits, then you can rotate the first qubit to /a|0) + v1 —aJ1) by a small circuit that does some 
single-qubit gates on the first qubit conditioned on the bits in the |a)-part. That circuit is the 
same for all values of a, so it’s independent of the particular |0)|a) you’re acting on. You may just 
assume you can do this circuit, without writing out its details. 

11.d. Note that the computational basis states |x) are the eigenstates of P and hence also of U, so 
the only thing you need to do is multiply with the right phases for them. 

11.e. By conjugating with single-qubit gates you can change non-Z Paulis to Zs, in order to reduce 
to the case of (d). 


Chapter 10 


No hints for this chapter, sorry! 


Chapter 11 


4.a. Use Exercise 2. 

4.b. Show that the symmetrized approximate polynomial r induced by the algorithm has degree 
at least N. 

6.c. Use the result of Exercise 5 for N = 2. 

7.b. When defining the relation R, consider that the hardest task for this algorithm is to distinguish 
inputs of weight N/2 from inputs of weight N/2 + 1. 

9.b. Consider the Boolean-valued problem of distinguishing the inputs where 0 sits at an odd 
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location 7 in the string x from those where 0 sits at an even location. 

10. Show how you can use sorting to solve the Majority-problem and then use the lower bound 
from Exercise 7 to get an Q(N) lower bound on sorting. (It is actually known that sorting takes 
Q(N log N) comparisons even on a quantum computer, but you don’t have to show that.) 

11.a. Reduce the bs(f)-bit OR function (restricted to inputs of weight 0 or 1) to f and invoke the 
lower bound that we know for OR. 

12.b. Use induction on T and triangle inequality. 

12.d. Add up the inequalities of (b) and (c) over all 7, and use the Cauchy-Schwarz inequality. 
13.b. Compare the expected value of a monomial of degree < 2T under distributions U and D, 
and then use the fact that a polynomial is a sum of monomials. 


Chapter 12 


2.a. Use that I = PoP + Pi Py. 

2.b. Note that t = T here, so we’re considering the final states of an algorithm that, for every 
input x, outputs the wrong value 1 — f(x) with probability < e. 

2.c. Use (a), the fact that Pri) Py_ f(y) + Pry) P-ta) = I whenever f(x) # f(y), and the fact that 
Tz, = 0 whenever f(x) = f(y). 

2.d. Cauchy-Schwarz and the definition of operator norm imply |(¢|M|wd)| < Ilol- IMi < 
Ilol- MI- |] }24)|| for all matrices M and vectors |¢), |W). 

3.b. Note that the |¢{)’s are pairwise orthogonal due to having a different basis state in their query 
register. 

3.c. Show and use that I — Oz,4Oy,4 = 2 ee P;. Also use that (T;)zy = 0 whenever x; = yi. 


3.d. Cauchy-Schwarz and the definition of operator norm imply |(4{|(I@T;)|¢%)| < Z 8 Tillie ||’. 
4. Use that X and C — 5>\", y; A; are both psd. 


Chapter 13 


1. Use binary search, running the algorithm with different choices of k to “zoom in” on the largest 
prime factor. 

3.a. Use the last item of Appendix B.2 to make the error probability exponentially small. 

3.c. Use the error analysis of Exercise 4.4. 

4.a. Write |0.) = a|0)|¢0) + 6|1)|¢1), and consider the inner product between (Z ® I)|@z) and |z). 
4.b. Use part (a). Analyze the amplitude of |x, 097”) in the final state |w,), using ideas from the 
proof of BQP C PSPACE in Section 13.3. Note that in contrast to that proof, you cannot use 
more than polynomial time for this exercise. 


Chapter 14 


3. Use binary search with different values of a,b to zoom in on the right value. 
5.b. Argue about the penalty given by Hock, which can’t be larger than Amin. 


5.d. Use ("| HIy") = || VI") 
5.e. Use Eq. (14.3). 

5.f. Use (e) and Cauchy-Schwarz. 

5.g. Sum (f) over all (T + 1)? pairs t,t’ to lower bound (~’|2"). 

5.j. Once all three simplifying assumptions have been satisfied, we can invoke the energy lower 


and triangle inequality. 


192 


bound of Q(1/T?) proved in Section 14.3.1. 

6.a. Invoke the Marriott-Watrous result mentioned at the end of Section 14.1 (without proving it). 
7. Combine ideas from Exercises 6.b and 13.4. 

8.a. Note that from the description of H you can infer the circuit Cn = Ur---U; for n-bit instances, 
and you can then run U;---U in a controlled manner, for every t € {0,..., T} of your choice. 


Chapter 15 


2.a. It suffices to use pure states with real amplitudes as encoding. Try to “spread out” the 4 
encodings |¢00), |¢01), |¢10), |911) in the 2-dimensional real plane as well as possible. 

3. Use the fact that 1 classical bit of communication can only send 1 bit of information, no matter 
how much entanglement Alice and Bob share. Combine this fact with superdense coding. 

6.b. Think of the first n qubits as Alice and the last n qubits as Bob; use Holevo’s theorem. 

7.a. Consider the positive and negative eigenvalues in the spectral decomposition of pọ — p1, and 
analyze the success probability minus the error probability. 


Chapter 16 


1. Argue that if Alice sends the same message for distinct inputs z and 2’, then Bob doesn’t know 
what to output if his input is y = a. 

2.a. Argue that if P is a projector then we can’t have both P|¢) = |} and P|w) = 0. 

2.c. Observe that among Alice’s possible n-bit inputs are the n codewords of the Hadamard code 
that encodes logn bits (see Section 15.3); each pair of distinct Hadamard codewords is at Hamming 
distance exactly n/2. Use part (a) to argue that Alice needs to send pairwise orthogonal states for 
those n inputs, and hence her message-space must have dimension at least n. 

3. Use the fact that 2 non-orthogonal states cannot be distinguished perfectly (Exercise 2), and 
that a set of 2” vectors that are pairwise orthogonal must have dimension 2”. 

5. Invoke the quantum random access lower bound, Theorem 4 of Section 15.2. 

6. Partition the n positions into disjoint sets and run (in parallel) a separate r-message intersection 
protocol for each of these sets. 

8.b. Let Alice send a random row of C(x) (with the row-index) and let Bob send a random column 
of C(y) (with the column-index). 

9.a. Two distinct polynomials, each of degree < d, are equal on at most d points of the domain Fp. 
10.b. Run the protocol of part (a) on an initial state where Bob has a well-chosen superposition 
over many |y). 

11.b. You can derive this from one of the communication lower bounds mentioned in this chapter, 
you don’t need to prove this from scratch. 

12. The matching M induces a projective measurement that Bob can do on the message he receives. 
13.d. Alice could send a uniform superposition over all h € H. 


Chapter 17 


1.b. You could write this out, but you can also get the answer almost immediately from part (a) 
and the fact that HT = H-!. 
2.b. It’s helpful here to write the EPR-pair in the basis |+) = 35 (10) + |1)), |-) = 35 (10) BEDP 


4. For every fixed input x,y, there is a classical strategy that gives a wrong output only on that 
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input, and that gives a correct output on all other possible inputs. Use the shared randomness to 
randomly choose one of those deterministic strategies. 

6.b. Argue that $(~|C|w) = Pr[win] — Pr[lose]. 

6.c. Use that A? and B? are the k-qubit identity matrix. 

6.d. Use Cauchy-Schwarz to show ((~|C|w))? < (w|C?\), and then upper bound the latter. 

6.e. cos(7/8)? = 5+ Z 


Chapter 18 


6. Use the encoding of Exercise 5, so that Alice and Bob need to cooperate to learn the key used 
to change p. 

7. Show that a unitary on Alice’s side of the state won’t change Bob’s local density matrix pp. 
9.a. The singular value decomposition (see end of Appendix A.5) of the d x d matrix M whose 
entries are M;; = a;; can be computed in polynomial time, you may assume this without proof. 


Chapter 19 


6.a. Start with a uniform superposition over all z € {0,1}” and end with an inverse QFT. You’re 
allowed to use a unitary like |c) +> e?7"°|c) since it does not depend on f. 

7.a. You can diagonalize V by something like a Hadamard gate on the “middle two” basis states, 
|01) and |10). 

7.b. W consists of k 2-qubit SWAP-gates. 

7.c. It’s helpful to write U% = I + ipn + F for some matrix F with ||F'|, = O(n?). This follows 
from Taylor series and the fact that p has trace 1, you don’t need to prove this. Here the trace 
norm ||Al|, of a matrix A is defined as the sum of A’s singular values. 

7.d. You can first prove this for the case where o = |a)(a| and p = |b)(b| are pure states, and then 
extend to general mixed states by linearity. 

7.£. Apply part (d) r = O(t?/e) times with n = O(e/t), choosing the constants in the O(-) such 
that rn = t and hence (U")" = Ut. Upper bound the overall error using triangle inequality. 

8.a. Use the SWAP-test from Section 16.6. “O(1) given copies” means you are allowed to use any 
number of copies of |@) and |Y), as long as that number is independent of n. You may count the 
3-qubit gate which is the controlled SWAP of a pair of qubits as an elementary gate here. 


Chapter 20 


1. Compute the trace Tr(£*E) in two ways, and use the fact that Tr(AB) = 0 if A and B are 
distinct Paulis, and Tr(AB) = Tr(J) = 2 if A and B are the same Pauli. 

5. Given an unknown qubit a|0) + 8|1) encoded using this code, you could split the 2k qubits into 
two sets of k qubits each, and use each to recover a copy of the unknown qubit. 
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